GCP - IOT Core Monitoring messages - google-cloud-platform

I'm kind of new on IOT world and trying to learn a bit about GCP products.
Ive made a simples python app that uses PAHO to send a message to an IOT topic (IOT Core in GCP).
Everything, apparently, works just fine. But I was wondering if I could see, on stackdriver, the content of a message that the device had sent.
I already have enable debuging log for it, but the message didnt show up.
Publish Log in stackdriver:
{
insertId: "78178yfwnl"
jsonPayload: {
eventType: "PUBLISH"
protocol: "MQTT"
publishFromDeviceTopicType: "EVENTS"
resourceName: "projects/demoiot/locations/us-central1/registries/iotchicago/devices/2753540639583"
serviceName: "cloudiot.googleapis.com"
status: {
code: 0
}
}
labels: {
device_id: "us_chi"
}
logName: "projects/demoiot/logs/cloudiot.googleapis.com%2Fdevice_activity"
receiveTimestamp: "2018-11-20T11:10:01.123928203Z"
resource: {
labels: {
device_num_id: "2753540639583"
device_registry_id: "iotchicago"
location: "us-central1"
project_id: "demoiot-223010"
}
type: "cloudiot_device"
}
severity: "DEBUG"
timestamp: "2018-11-20T11:10:01.104415969Z"
}

No telemetry data will be logged by our system. The potential for privacy concerns because of permissions to logs vs. permission on the telemetry itself is such that we didn't want to touch that.
You CAN write to Stackdriver explicitly though, so a way to do that would be to have a Cloud Function tied to the Pub/Sub topic where the telemetry is being written, and then have that function write messages out to Stackdriver with the payload data. Could also do it with DataFlow if Java is more your thing.
A teammate also pointed out to me, using the /state/ MQTT topic to write out the device's state and checking it in the GCP console is also a good way to quickly test/check. In the device details, there's a tab for "Configuration and State History" that will show you.

Related

How to get a more informative crash log from GCP functions

I am using gcp functions, and I think the function crush log is not informative like below, which only mention finished with status: 'crash'.
Wondering how we can get a crush stack log to find out the bug easier?
Thank you.
{
insertId: "000000-e410caa4-0287-4e3f-b80d-c215e06a5430"
labels: {
execution_id: "pbc4oxnzl1ji"
}
logName: "projects/sturdy-hangar-2/logs/cloudfunctions.googleapis.com%2Fcloud-functions"
receiveTimestamp: "2020-07-04T07:01:52.837476035Z"
resource: {
labels: {
function_name: "gen_thumbnail"
project_id: "sturdy-hangar-2"
region: "us-central1"
}
type: "cloud_function"
}
severity: "DEBUG"
textPayload: "Function execution took 91 ms, finished with status: 'crash'"
timestamp: "2020-07-04T07:01:51.834276757Z"
trace: "projects/sturdy-hangar-2/traces/1bbc73e71f5057d38ecb65be525b233b"
}
I understand that ambiguous crash logs without stack traces can be quite frustrating to troubleshoot and test. This is a known issue with Cloud Functions as of now and is actively being investigated by Cloud Functions specialists. I would suggest that you take a look at the Public Tracker and click the star icon to subscribe to the issue for further updates and click the bell icon to be notified of updates via email.
As all issues require a dedicated team of specialists to investigate them accordingly, we do not have an ETA on the resolution. Please rest assured that we are doing our best to get this issue resolved.

GCP Cloud Scheduler throws ERROR for a HTTP targettype

I have created a GCP Cloud Scheduler job to run every 15 minutes. It is supposed to call an API from my Node js application.
In the console the job definition looks like this:
Description: A job
Frequency: */15 * * * *
Timezone: Central Standard Time
Target: HTTP
URL: https://<company url>/api/email-reminder/
HTTP method: GET
Auth header: Add OIDC token
Service account: xxxxxxxxxxx-compute#developr.gserviceaccount.com
When it runs it returns the following in the logs:
httpRequest: {
}
insertId: "15wxxxxxxge1lv"
jsonPayload: {
#type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
jobName: "projects/<project name>/locations/us-central1/jobs/xxxxxxxxx-scheduler-emailreminders-1"
status: "UNKNOWN"
targetType: "HTTP"
url: "https://<company url>/api/email-reminder/"
}
logName: "projects/<project name>/logs/cloudscheduler.googleapis.com%2Fexecutions"
receiveTimestamp: "2019-11-14T04:45:50.280446452Z"
resource: {
labels: {…}
type: "cloud_scheduler_job"
}
severity: "ERROR"
timestamp: "2019-11-14T04:45:50.280446452Z"
}
How do I find out more information about the error?
The default value for timeout scheduler jobs process is 180s, you can change vía gcloud command
gcloud scheduler jobs update http my-super-job --attempt-deadline 540s
As well you can see the complete info of your jobs with this commands...
gcloud scheduler jobs list
gcloud scheduler jobs describe my-super-job
I've seen similar problems recently using Cloud Scheduler for HTTPS targets. Every so often, the scheduler will fail, and all I get is a log message like yours.
Viewing in the Logs Viewer, the key parts of the log are, in the log header:
"status":"RESOURCE_EXHAUSTED",
"#type":"type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
and in the log data:
httpRequest: {
status: 429
}
jsonPayload: {
#type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
jobName: "projects/joburlhere"
status: "RESOURCE_EXHAUSTED"
targetType: "HTTP"
url: "https://urlgoeshere"
}
severity: "ERROR"
"Resource exhausted" is the description of the 429 error code.
There's a description of this code here:
https://cloud.google.com/apis/design/errors
429 RESOURCE_EXHAUSTED Either out of resource quota or reaching rate limiting. The client should look for google.rpc.QuotaFailure error detail for more information.
Given that I'm running this Job once an hour, and the receiver is a very modest Cloud Function, I don't I'm doing anything to cause resource exhaustion. So I think it's a recurring transient problem with Google's cloud infrastructure. I'm guessing that the cloud functions were unavailable for that particular request, and because I set up the Cloud Function with default settings, the Scheduler did not retry.
Additionally, it is possible to configure a Scheduler Job to retry in case of failure. This functionality is not shown in the web console, but you can control it using the gcloud command.
The default setting is not to retry.
Look at the --max-retry-attempts flag.
https://cloud.google.com/sdk/gcloud/reference/scheduler/jobs/update/http
There are similar controls for pubsub Jobs
https://cloud.google.com/sdk/gcloud/reference/scheduler/jobs/update/pubsub
It's because the http endpoint you specified doesn't return the response within the default attempt deadline.
Refer the link

why my google cloud instance often shut down automatically by itself

my google cloud instance(compute engine) was shut down automatically with unknown reason
there is the event log for the instance
{"shutdownEvent":{},"bootCounter":"6","#type":"type.googleapis.com/cloud_integrity.IntegrityEvent"}
{
insertId: "0"
jsonPayload: {
#type: "type.googleapis.com/cloud_integrity.IntegrityEvent"
bootCounter: "6"
shutdownEvent: {
}
}
logName: "projects/<xxxx>/logs/compute.googleapis.com%2Fshielded_vm_integrity"
receiveTimestamp: "2019-05-09T10:14:29.112673979Z"
resource: {
labels: {
instance_id: "<xxxx>"
project_id: "<xxxx>"
zone: "us-west1-b"
}
type: "gce_instance"
}
severity: "NOTICE"
timestamp: "2019-05-09T10:14:26.928978844Z"
}
gcloud compute instances update INSTANCE_NAME --shielded-learn-integrity-policy
Instance must be running and have Integrity Monitoring enabled.
I don't know if this is secure though. Read more:
https://cloud.google.com/compute/docs/instances/integrity-monitoring#updating-baseline
This seems to be a shielded VM with "integrity monitoring" enabled
... and most likely this is caused by an integrity validation failure.
You might use Stackdriver Advanced Logs Filters and try to get more details about the reason behind the shutdown, for example, sometimes instances are migrated (Live Migration) and cause the behavior reported.
Some filters that may help you to determine what caused the shutdown on instances are:
Checking for migrations, in Stackdriver Advanced Logging filter, use following entries:
compute.instances.migrationOnHostMaintenance
OR
compute.instances.migrateOnHostMaintenance
For hostError, you can use following filter:
"compute.instances.hostError"
For VM Instance Stop:
resource.type="gce_instance"
protoPayload.methodName="v1.compute.instances.stop"
had the some problem and confirmed from gcp that my vm had moved from one physics host to another..
pls feel free to open case and google customer support will help you diagnostics

Google Dataflow - Invalid Regional Endpoint - Impossible set region on template from nodejs client

I have a pipeline stored as a template. I'm using the node.js client to run this pipeline from a cloud function. Everything works fine, but when I need to run this template from different regions I get errors.
According to the documentation, I can set it through the location parameter in the payload
{
projectId: 123,
resource: {
location: "europe-west1",
jobName: `xxx`,
gcsPath: 'gs://xxx'
}
}
That gives me the following error:
The workflow could not be created, since it was sent to an invalid regional endpoint (europe-west1).
Please resubmit to a valid Cloud Dataflow regional endpoint.
I get the same error if I move the location parameter out of resource node, such as:
{
projectId: 123,
location: "europe-west1",
resource: {
jobName: `xxx`,
gcsPath: 'gs://xxx'
}
}
If I set the zone in the environment and remove the location such as:
{
projectId: 123,
resource: {
jobName: `xxx`,
gcsPath: 'gs://xxx',
environment: {
zone: "europe-west1-b"
}
}
}
I do not get any errors anymore, but dataflow UI tells me the job is running in us-east1
How can I run this template and providind the region / zone I
As explained here, there are actually two endpoints:
dataflow.projects.locations.templates.launch (API Explorer)
dataflow.projects.templates.launch (API Explorer)
For Dataflow regional endpoints to work, the first one must be used (dataflow.projects.locations.templates.launch). This way, the location parameter in the request will be accepted. Code snippet:
var dataflow = google.dataflow({
version: "v1b3",
auth: authClient
});
var opts = {
projectId: project,
location: "europe-west1",
gcsPath: "gs://path/to/template",
resource: {
parameters: launchParams,
environment: env
}
};
dataflow.projects.locations.templates.launch(opts, (err, result) => {
if (err) {
throw err;
}
res.send(result.data);
});
I have been testing this though both the API explorer and the console using Google-provided templates. Using the wordcount example I get the same generic error than you do with the API explorer, which is the same if the location name is incorrect. However, the Console provides more information:
Templated Dataflow jobs using Java or Python SDK version prior to 2.0
are not supported outside of the us-central1 Dataflow Regional
Endpoint. The provided template uses Google Cloud Dataflow SDK for
Java 1.9.1.
Which is documented here as I previously commented. Running it confirms it's using a deprecated SDK version. I would recommend doing the same process to see if this is actually your case, too.
Choosing a different template, in my case the GCS Text to BigQuery option from the Console's drop-down menu (which uses Apache Beam SDK for Java 2.2.0) with location set to europe-west1 works fine for me (and the job actually runs in that region).
TL;DR: your request is correct in your first example but you'll need to update the template to a newer SDK if you want to use regional endpoints.

Error "Failed to initialize a region" while creating google cloud functions

I am trying to deploy cloud functions in production environment after testing it in test environment. Everything in test environment worked fine. But in prod environment when I crate cloud function is get following error
Failed to initialize a region
The complete stackdriver log is here:
{
protoPayload: {
#type: "type.googleapis.com/google.cloud.audit.AuditLog"
status: {
code: 13
message: "Failed to initialize a region"
}
serviceName: "cloudfunctions.googleapis.com"
methodName: "google.cloud.functions.v1beta2.CloudFunctionsService.CreateFunction"
resourceName: "projects/**********/locations/us-central1/functions/load-to-bigquery"
}
insertId: "54F77BC951659.A03E406.770C4DE6"
resource: {
type: "cloud_function"
labels: {
project_id: "**********"
region: "us-central1"
function_name: "load-to-bigquery"
}
}
timestamp: "2017-05-14T08:29:27.142Z"
severity: "ERROR"
logName: "projects/********/logs/cloudaudit.googleapis.com%2Factivity"
operation: {
id: "operations/dWRjb2xsZWN0L3VzLWNlbnRyYWwxL3VkYy10by1icS9NaXRlVlI2MW1xbw"
producer: "cloudfunctions.googleapis.com"
last: true
}
receiveTimestamp: "2017-05-14T08:29:28.495542553Z"
}
I wanted to know what could be the cause of error? Both environments are quite similar and I have same permission (Project editor) in both environment.
It's possible that the Cloud Functions API was enabled in your Google Cloud Platform project while the product was in Alpha and was somehow missed during the migration. If that's the case and you are still being affected by this issue you should disable and re-enable the Cloud Function API on your project, which first requires deleting all of your functions.
If you have issues deleting your functions or re-enabling the API, you should create an issue so the product team can help you.