Does Cloud Run have an equivalent of Cloud Functions' execution_id? - google-cloud-platform

Any record logged from a GCP Cloud Function contains a labels.execution_id, e.g.:
{
"textPayload": "Function execution started",
"insertId": "12mylqhfm6hy8i",
"resource": {
"type": "cloud_function",
"labels": {
"function_name": "redacted",
"region": "europe-west2",
"project_id": "redacted"
}
},
"timestamp": "2022-09-26T10:57:26.917823762Z",
"severity": "DEBUG",
"labels": {
"execution_id": "1l1qb00ft6kv"
},
"logName": "projects/redacted/logs/cloudfunctions.googleapis.com%2Fcloud-functions",
"trace": "projects/redacted/traces/d2f793cf6e2fb149a8ce8dc6fd0498b4",
"receiveTimestamp": "2022-09-26T10:57:26.920210899Z"
}
This is very useful for correlating all logs from a single invocation of the cloud function because it can be filtered upon in Logs Explorer:
labels.execution_id="1l1qb00ft6kv"
I see no equivalent for Cloud Run though. Cloud Run logs do have labels.instance_id but my understanding is that that pertains to the Cloud Run app instance so will be the same for all invocations on that instance. Hence its not the same as Cloud Functions' labels.execution_id.
Does Cloud Run have an equivalent of Cloud Functions' execution_id or would I have to roll my own? If the latter, does anyone have any strategies for doing so?

No there isn't an execution ID, only the instanceID. To have that, you can use instrumentation tools, like Open Telemetry as mentioned by guillaume at stackoverflow question, you can refer this video. You can also customize the app logs with a custom/random execution ID (similar of what OT does).
Also Have a look at this link1 & link2 which might help

Related

EC2 Instance Vanished

We have a peculiar situation today where we see that one of our EC2 instance has disappeared from the console and we weren't sure what caused this. Cloudtrail doesn't have any terminated event against this instance-id.
The last noted cloudtrail event for the instance-id that went down goes something like this
{
"eventVersion": "1.08",
"userIdentity": {
"type": "AWSService",
"invokedBy": "ec2.amazonaws.com"
},
"eventTime": "2022-03-23T05:46:40Z",
"eventSource": "sts.amazonaws.com",
"eventName": "AssumeRole",
"awsRegion": "ap-south-1",
"sourceIPAddress": "ec2.amazonaws.com",
"userAgent": "ec2.amazonaws.com",
"requestParameters": {
"roleArn": "arn:aws:iam::2************:role/ec2-instance-***********",
"roleSessionName": "i-06135ad01bb90****"
},
"responseElements": {
"credentials": {
"accessKeyId": "<redacted>",
"sessionToken": "<redacted>",
"expiration": "Mar 23, 2022, 12:01:34 PM"
}
},
"requestID": "d9882911-39e7-449b-9701-***********"",
"eventID": "0fa1b79b-08aa-48e6-8232-***********"",
"readOnly": true,
"resources": [
{
"accountId": "2************",
"type": "AWS::IAM::Role",
"ARN": "arn:aws:iam::2************:role/ec2-instance-***********"
}
],
"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "2************",
"sharedEventID": "4b842373-e89d-438b-be3b-*********",
"eventCategory": "Management"
}
The only thing that I can think of is either a hardware failure from AWS side or some crude command ran within the OS by some user that took the instance down. Unfortunately we dont have AWS developer support as that's quite costly.
Has anyone faced anything similar? Any leads on how i can go ahead to find the root cause?
For anyone that is interested or may face this in future, we had to opt for AWS developer support to get an answer, here is what they had to say
From the case notes, I understood that the instance
'i-06135ad01bb******' was missing from yesterday. However, you tried
to check cloudtrail and could not find any traces of termination.
Please correct me if I misunderstood your query.
Upon reviewing the case description, I started checking the instance
using our internal tools and observed that the instance got terminated
on '2022-03-23 06:33 UTC' with the reason
'INSTANCE-INITIATED-SHUTDOWN'.
This means that the shutdown got initiated from OS. Please allow me to
inform you that AWS engineers do not have access or visibility to the
customer's instance/OS level due to data privacy[1] and shared
responsibility model[2]. Hence, I will not be in a position to check
how shutdown call got initiated.
To further investigate this, I checked using our internal tools and
could see that you have selected the 'termination' option: on shut
down which means that when an instance gets shutdown it will be
automatically terminated. So, I would request you to change the option
'termination' : on shut down to 'STOP' : on shut down. With this, if
the OS initiates a shutdown by any chance, the instance will be
stopped instead of getting terminated.
As per my analysis, I can confirm that the AWS infrastructure was healthy and there weren't any issues from our end. However, for the future, I would request you to consider the following best practices which you already might be aware of, however I am mentioning them here for the sake of completeness:
Enable termination protection
Regularly back up your data
Preserving root volume after termination:
Thank You fannymug. I think this was the case also for me.
Just to add up:
in cloudtrail search for the instance ID and select the RunInstances eventName
here it is possible to check the event details.
Double check the value for deleteOnTermination value. If it is set to true, termination protection is not enabled.
To avoid this, during EC2 creation process, look in advanced details > Termination Protection > Enable.
Shutdown behavior option can also help and can be set to Stop.
I hope it helps

Google Cloud VM Shutdown by "Integrity Event"

It seems like my Gooogle Cloud VM was shutdown by an "integrity event":
{
"insertId": "3",
"jsonPayload": {
"lateBootReportEvent": {
"policyEvaluationPassed": false,
"policyMeasurements": [
],
"actualMeasurements": [
]
},
"#type": "type.googleapis.com/cloud_integrity.IntegrityEvent",
"bootCounter": "3"
},
"resource": {
"type": "gce_instance",
"labels": {
"zone": "us-central1-a",
"instance_id": "xxx",
"project_id": "xxx"
}
},
"timestamp": "2022-02-09T03:58:16.830409192Z",
"severity": "ERROR",
"logName": "projects/xxx/logs/compute.googleapis.com%2Fshielded_vm_integrity",
"receiveTimestamp": "2022-02-09T03:58:18.846995634Z"
}
Can those be prevented or even disabled somehow?
Can those be prevented or even disabled somehow?
The answer depends on what you mean. You are using a Shielded VM which protects you from:
Prevent tampering with the guest VM image.
Prevent altering sensitive crypto operations.
Prevent exfiltrating secrets sealed in the vTPM
Prevent modifying the system with UEFI drivers.
Prevent modifying guest firmware.
Prevent modifying the kernel.
Those actions will trigger an integrity event. To prevent an integrity event, do not modify the system.
Refer to logName for more information.
Note: lateBootReportEvent compares the original baseline to the latest boot sequence. The integrity policy baseline is used for comparison with measurements from subsequent VM boots to determine if anything has changed.
What is Shielded VM?

Sns mail notification when a step is not kicked off within a threshold timeframe

I have an emr step which is submitted through step function. During step run I can see task is submitted, but emr step is not executed and emr console don’t have any information .
How can I debug this?
How can I send an sns when a step doesn’t start execution with in a threshold timeframe?in my case step function shows emr task submitted but no information on emr console and pipeline is long running without failing for more than half hr
You could start the debugging process through the Step Functions execution log and identify the specific step that has failed, and later, you can move on looking for the EMR console or the specific service that has failed. Usually when the EMR step doesn't appear in the EMR console, is due to a Runtime Error, caused by an exception raised when calling the EMR step.
For this scenario, you can use the Error Handling that Step Functions has, using the Catch and Timeout fields, you can find more details in the AWS documentation here.
Basically you need to add this fields as show bellow:
{
"StartAt": "EmrStep",
"States": {
"EmrStep": {
"Type": "Task",
"Resource": "arn:aws:emr:execute-X-step",
"Comment": "This is your EMR step",
"TimeoutSeconds": 10,
"Catch": [ {
"ErrorEquals": ["States.Timeout"],
"Next": "ShutdownClusterAndSendSNS"
} ],
"End": true
},
"ShutdownClusterAndSendSNS": {
"Type": "Pass",
"Comment": "This step handles the timeout exception raised",
"Result": "You can shutdown the EMR cluster to avoid increased cost here and later send a sns notification!",
"End": true
}
}
Note: To catch the timeout exception, you have to catch the error States.Timeout, but also you can define the same catch field for other types of error.

Cloud Tasks Not Triggering HTTPrequest Endpoints

I have one simple cloud task queue and have successfully submitted a task to the queue. It is supposed to deliver a JSON payload to my API to perform a basic database update. The task is created at the end of a process in a .net core 3.1 app running locally on my desktop triggered by postman and the API is a golang app running in cloud run. However, the task never seems to fire and never registers an error.
The tasks in queue is always 0 and the tasks running is always blank. I have hit the "Run Now" button dozens of times but it never changes anything and no log entries or failed attempts are ever registered.
The task is created with the OIDCToken with a service account and audience set for the service account that has the authorization to create tokens and execute the cloud run instance.
Screen Shot of Tasks Queue in Google Cloud Console
Task creation log entry shows that it was created OK:
{
"insertId": "efq7sxb14",
"jsonPayload": {
"taskCreationLog": {
"targetAddress": "PUT https://{readacted}",
"targetType": "HTTP",
"scheduleTime": "2020-04-25T01:15:48.434808Z",
"status": "OK"
},
"#type": "type.googleapis.com/google.cloud.tasks.logging.v1.TaskActivityLog",
"task": "projects/{readacted}/locations/us-central1/queues/database-updates/tasks/0998892809207251757"
},
"resource": {
"type": "cloud_tasks_queue",
"labels": {
"target_type": "HTTP",
"project_id": "{readacted}",
"queue_id": "database-updates"
}
},
"timestamp": "2020-04-25T01:15:48.435878120Z",
"severity": "INFO",
"logName": "projects/{readacted}/logs/cloudtasks.googleapis.com%2Ftask_operations_log",
"receiveTimestamp": "2020-04-25T01:15:49.469544393Z"
}
Any ideas as to why the tasks are not running? This is my first time using Cloud Tasks so don't rule out the idiot between the keyboard and the chair.
Thanks!
You might be using a non-default service. See Configuring Cloud Tasks queues
Try creating a task from the command line and watch the logs e.g.
gcloud tasks create-app-engine-task --queue=default \
--method=POST --relative-uri=/update_counter --routing=service:worker \
--body-content=10
In my own case, I used --routing=service:api and it worked straight away. Then I added AppEngineRouting to the AppEngineHttpRequest.

Which casing of property names is considered the "most correct" in a Google Cloud Pub/Sub Push Message?

If you use a "Push" subscription to a Google Cloud Pub/Sub, you'll be registering an HTTPS endpoint that receives messages from Google's managed service. This is great if you wish to avoid dependencies on Google Cloud's SDKs and instead trigger your asynchronous services via a traditional web request. However, the intended casing of the properties of the payload is not clear, and since I'm using Push subscriptions I don't have a SDK to defer to for deserialization.
If you look at this documentation, you see references to message_id using snake_case (Update 9/18/18: As stated in Kamal's answer, the documentation was updated since this was incorrect), e.g.:
{
"message": {
"attributes": {
"key": "value"
},
"data": "SGVsbG8gQ2xvdWQgUHViL1N1YiEgSGVyZSBpcyBteSBtZXNzYWdlIQ==",
"message_id": "136969346945",
"publish_time": "2014-10-02T15:01:23.045123456Z"
},
"subscription": "projects/myproject/subscriptions/mysubscription"
}
If you look at this documentation, you see references to messageId using camelCase, e.g.:
{
"message": {
"attributes": {
"key": "value"
},
"data": "SGVsbG8gQ2xvdWQgUHViL1N1YiEgSGVyZSBpcyBteSBtZXNzYWdlIQ==",
"messageId": "136969346945",
"publishTime": "2014-10-02T15:01:23.045123456Z"
},
"subscription": "projects/myproject/subscriptions/mysubscription"
}
If you subscribe to the topics and log the output, you actually get both formats, e.g.:
{
"message": {
"attributes": {
"key": "value"
},
"data": "SGVsbG8gQ2xvdWQgUHViL1N1YiEgSGVyZSBpcyBteSBtZXNzYWdlIQ==",
"messageId": "136969346945",
"message_id": "136969346945",
"publishTime": "2014-10-02T15:01:23.045123456Z",
"publish_time": "2014-10-02T15:01:23.045123456Z"
},
"subscription": "projects/myproject/subscriptions/mysubscription"
}
An ideal response would answer both of these questions:
Why are there two formats?
Is one more correct or authoritative?
The officially correct names for the variables should be camel case (messageId), based on the Google JSON style guide. In the early phases of Cloud Pub/Sub, snake case was used for message_id and publish_time, but was changed later in order to conform to style standards. The snake case ones were kept in addition to the camel case ones in order to ensure push endpoints depending on the original format did not break. The first documentation link you point apparently was not updated at the time and it will be fixed shortly.