GCP IoT connection in bizarre state - google-cloud-platform

I have a Google IoT app that's been running well for quite a while. The devices send telemetry, and occasionally receive commands from the cloud.
One of my devices recently got into a state where it was connected and sending telemetry, which was being received just fine, but trying to send a commands to it resulted in a "that device is not connected" error. The console page for that device showed last error as "[9] mqtt: The connection broke or was closed by the client". I could not send commands either from my software or from the console. But the device was connected and sending telemetry just fine.
My device will recover from disconnections just fine--but if it is perfectly connected and running fine from its point of view, it has no reason to. And I can't tell it to reboot remotely because I can't send commands to it. If the device remains in this state, it essentially becomes an orphan.
My questions, then, are:
(1) How is it possible to get into this state, and is there any way for me to avoid it, or detect it from the firmware side so that I can reboot?
(2) Is there any way to fix this from the cloud side--to force a reconnection when I can't reach the device?
(3) Am I saved by the fact that my device will eventually disconnect and reconnect when the JWT expires? Will this "fix" the connection to be bidirectional?

Related

Google Cloud - IoT Core - Config sent every 1 hour reboot the device

I have a ESP8266 with a relay to turn on/off a ligth.
All is working great, but IoT core is sending a configuration every 1 hour and that makes the device to reboot, when the device starts again there is no guarantee that the initial state is the desired.
Its there any way to avoid this automatically config?
Thanks.
IoT Core sends the latest configuration to the device each time the device (re)connects, to make sure it is up to date, even if new configuration was sent to it while it was disconnected. This is expected IoT Core behaviour.
As mentioned in other answer, what is probably happening is that your device is not sending data during that period of time, which makes the connection timeout after one hour. The device tries to reconnect, receives the latest configuration and that causes it to reboot.
You have many options to avoid this:
Implement keep alive to keep the connection open.
Refresh the JWT before it expires (this effectively restarts the timer for timeout too).
If you are not expecting configuration sent from IoT Core to the device, do not subscribe to the configuration mqtt topic.
I could solved it by change the logic inside the device. Every hour the JWD needed to be refreshed, so that made iot core send a message to the device with the new status.

Many "MQTT disconnect - ALREADY_EXISTS" error in Google Cloud IoT console log

I use Google IoT Core in my project as MQTT broker to connect IoT embedded devices based on Atmel MCU to Goole Cloud Plattform.
In the plattform log, i experience many "MQTT DISCONNECT" errors.
jsonPayload: {
disconnectType: "SERVER"
eventType: "DISCONNECT"
protocol: "MQTT"
resourceName: "projects/xxxxxxx/locations/europe-west1/registries/xxxxxxxx/devices/1234567890"
serviceName: "cloudiot.googleapis.com"
status: {
code: 6
description: "ALREADY_EXISTS"
message: "SERVER: The connection was closed because there is another active connection with the same device ID."
}
}
labels: {
device_id: "d1234567890"
}
logName: "projects/xxxxxxxxx/logs/cloudiot.googleapis.com%2Fdevice_activity"
The error is generated when the device boots-up and connect to MQTT server.
Despite this error, the connection is successful, as is the subscription to topics and message publishing.
I understand that the previous connection was not closed gracefully, but it is imbossible by the nature of the embedded device, that is meant to be always connected, and eventually turned off by disconnecting power supply (so it cannot send a disconnect message to server before).
The device ID is always the same at every reconnection, but unique per-device; i use chip serial number as in some Google's examples.
My question is, if there is a solution to this error, that can be ignored in developement phase, but would be unwanted behavior in the production environment.
I am thinking that you want the particular error excluded because of "hygiene" reasons. I am thinking that when you eyeball logs, you are on the alert for error type messages and ones of this nature would be considered distracting.
Fortunately, Stackdriver logging provides an exclusion capability. You can provide sets of filters that cause Stackdriver to discard log entries that you explicitly choose not to keep. This is described in detail here:
https://cloud.google.com/logging/docs/exclusions
I found the illustration at the page particularly helpful.
What you will likely want to do is forumulate a query that matches exactly this type of message. I haven't tested it but something loosely like:
logName = 'cloudiot.googleapis.com%2Fdevice_activity'
jsonPayload.eventType = 'DISCONNECT'
jsonPayload.status.description= 'ALREADY_EXISTS'
Once you have a filter expression that matches just the messages to be excluded, you can then use that filter as an exclusion filter.
The MQTT connection to the device bridge has a few special properties that can cause disconnects. For device connections, you are limited to one connection per-device (or device ID, for the sakes of a gateway). It looks to me like you're trying to connect the same device twice and it's causing a disconnect.
It's possible that you have a client that tries to open up two connections, or you are connecting a second client. If you're connecting the same device twice, the device will be disconnected. Maybe your client is setup to open up multiple channels or your application logic is reconnecting without disconnecting.
There are various other reasons for disconnects, for example, if you try to publish with the incorrect QoS or to an invalid topic but this doesn't appear to be that given publish is working on subsequent connections.

Why is my AWS DeepLens unable to connect to WiFi?

I am setting up my AWS DeepLens and all the steps have been successful until I try to connect to my home WiFi. How do I fix this issue?
I created a hotspot on my phone to test against a different network and this connection was successful. Then, I switched back to my home WiFi and it connected successfully.
This section of the troubleshooting guide will also fix the problem.
We found that the AWS DeepLens only has one network adapter which it uses both for its own hotspot and connecting to the network. If you are connected to it via any other means (e.g. via a phone) it will throw a hissy and start dropping the connection, repeatedly and seemingly randomly.
When we connected a monitor directly we then found it was stuck on a viewable password prompt, hence why it was not connecting to our network.
Best method by far (and from our experience, only usable option) is to connect directly to the device so you can see what it is doing. To do this you need USB keyboard and mouse, and a mini-HDMI to HDMI cable to hook up a monitor. This will free up the network card to do only one thing.
When connecting please note that the default admin password on ours was "aws_cam". This does not seem to be noted anywhere in the documentation. This will change when you go through the setup process and sync it with your AWS account.
Repeat the process by inserting a pin in the hole at the back of DeepLens. Wait for a few seconds, the wifi indicator (the middle light) would blink and then you can connect with Deeplens wireless network. Then you can open http://deeplens.config where you can configure your home wifi and complete the setup.

How to address RdKafka::ERR__TIMED_OUT and RdKafka::ERR__MSG_TIMED_OUT in librdkafka?

I am working on C++ kafka client librdkafka. Looking into the example https://github.com/edenhill/librdkafka/blob/master/src-cpp/rdkafkacpp.h and https://github.com/edenhill/librdkafka/blob/master/examples/rdkafka_example.cpp, it seems that there is no process of connecting to broker? How to do some reconnect staff for these connection errors? How to check the connection status?
librdkafka abstracts all broker connectivity from the application, it will attempt to always keep a connection to each known broker (either learnt through metadata.broker.list or by the broker list returned from the first bootstrap brokers).
Upon connection error librdkafka will attempt to connect again, forever.
If none of brokers can be connected to the ALL_BROKERS_DOWN event will be triggered but there is currently no corresponding event for when brokers being to come back online.
The application doesn't need to worry though since librdkafka takes care of all reconnects and message retransmissions in the background and it will keep trying to get the messages produced until either message.timeout.ms or message.send.max.retries are exceeded.
There's more information on this in the introduction guide:
https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md

how server socket know network cable in unplugged in windows using c++

i am developing client server application in windows using c++ and winsock lib it work fine but if it is on network and once server listening started and if i remove network cable then server doesn't shows any error in any thread so where server socket knows network cable is unplugged.
if any body knows please help me.
While it should be possible to detect that the network cable is unplugged on the host, you will still have the same problem if the network is disrupted somewhere else between your server and the clients.
One common (if not the most common) way to solve this is to have a "keep-alive" message being sent. If no reply to that message is received within some timeout you simply close the connection and release all resources associated with it.
Edit
A "keep-alive" message is like using the "ping" command to see if a remote machine can be reached. It is simply a message that is sent, either by the server or the client (it doesn't matter who initiate it) to see if the other end of the connection is alive and can be reached.
It can be as simple as sending the string "Are you there?" and expecting a reply containing "Yes I am". If you send it once every minute, and don't get a reply withing (for example) one minute, you can consider the connection being dead. The other end, that receives the "Are you there?", knows it will get the message once every minute. If it hasn't arrived for two minutes then the sender is no longer reachable.
If the protocol can't be modified to add such messages, then see if some other message can be used instead.
Also, remember that the best and some cases only way to know if something is wrong with a connection is to attempt to read from the socket.
You can unplug a network and then plug it back in, or your Wi-Fi laptop can lose reception for a second and then pick it back up. It would be frustrating if such resumable cases were treated as an error in all the programs we use.
From this Winsock "newbie" FAQ:
The previous question deals with detecting when a protocol connection is dropped normally, but what if you want to detect other problems, like unplugged network cables or crashed workstations? In these cases, the failure prevents notifying the remote peer that something is wrong. My feeling is that this is usually a feature, because the broken component might get fixed before anyone notices, so why demand that the connection be reestablished?
If you feel you have a "special needs" situation you can be aggressive with timeouts. But I wouldn't do that unless there was a really good reason.