Find out which IAM account manually triggered a scheduled function - google-cloud-platform

I have a GCP Cloud function which runs on a schedule every morning. The logs show that it has been triggered off-schedule three other times in the last week, which I presume can only happen if someone has gone to the Cloud Scheduler page and clicked 'Run now' on that function. How can I find out who did this? The Logs Explorer doesn't show this information. (Heads will not roll, but IAM permissions may be stripped. Bonus points if it turns out to have been me.)
For scheduled functions, there are two sets of logs - one for the cloud function triggered by the schedule, and one for the Cloud Scheduler itself. In the logs for the Cloud Scheduler, only the daily schedule shows up, not the extra triggers.
Log of the function starting in the logs explorer for the Cloud Function:
{
"textPayload": "Function execution started",
"insertId": "REDACTED",
"resource": {
"type": "cloud_function",
"labels": {
"region": "REDACTED",
"function_name": "REDACTED",
"project_id": "REDACTED"
}
},
"timestamp": "2022-05-04T08:49:37.980952884Z",
"severity": "DEBUG",
"labels": {
"execution_id": "REDACTED"
},
"logName": "projects/REDACTED/logs/cloudfunctions.googleapis.com%2Fcloud-functions",
"trace": "projects/REDACTED/traces/REDACTED",
"receiveTimestamp": "2022-05-04T08:49:37.981500851Z"
}

If the trigger of your Cloud Function is http and without authorization, it will be very hard (or near impossible) to figure out who called it.
Additionally, it is possible there were not enough available instances when it was scheduled, and then the Cloud Function ran later by a retry.

Related

Conditional waits in AWS Step Functions that monitor database readiness, S3 events and glue jobs?

I have a step function that looks currently like this:
{
"Comment": "RDS Step Functions",
"StartAt": "CopyLatestSnapshot",
"States": {
"CopyLatestSnapshot": {
"Type": "Task",
"Resource": "${aws_lambda_function.snapshot-copy.arn}",
"Next": "WaitTenMinutes"
},
"WaitTenMinutes": {
"Type": "Wait",
"Seconds": 600,
"Next": "ExportSnapshotToS3"
},
"ExportSnapshotToS3": {
"Type": "Task",
"Resource": "${aws_lambda_function.snapshot-export.arn}",
"Next": "WaitFiftyMinutes"
},
I would like to alter this so instead of WaitXXXMinutes steps I have an event-driven approach, so the "wait steps" listen for some kind of event and go forward (or fail upon timeout) when an actual event happens in the system.
I have the following event examples currently:
database snapshot taken and appears in S3
database is restored from a snapshot
database engine upgraded (so DB is offline and online again),
snapshot deleted from s3
glue job finished.
Even if I cover some of these with conditional waits that would be fine.
Note that I can not use Lambda for these waits as Lambda timeout is max 15 minutes and some operations take longer time.
To initiate the workflow, you could configure a Cloudwatch Events rule which triggers the state machine. Following steps can leverage Codebuild projects for validating the respective states, bounded by a timeout configuration

GCP dataflow quota issue

I'm new in GCP plateform.
I am setting up a dataflow job that collects data from pub/sub topic and send it to bigquery but i am having a quota issue.
See below the error code. I am looking forward to having a recommandation. Please help.
{
"textPayload": "Project gold-apparatus-312313 has insufficient quota(s) to execute this workflow with 1 instances in region us-central1. Quota summary (required/available): 1/7 instances, 4/4 CPUs, 1230/818 disk GB, 0/250 SSD disk GB, 1/99 instance groups, 1/49 managed instance groups, 1/99 instance templates, 1/3 in-use IP addresses.\n\nPlease see https://cloud.google.com/compute/docs/resource-quotas about requesting more quota.",
"insertId": "fjxvatcdw1",
"resource": {
"type": "dataflow_step",
"labels": {
"project_id": "76303513563",
"region": "us-central1",
"step_id": "",
"job_id": "2021-05-07_16_07_09-4793338291722263008",
"job_name": "my_job1"
}
},
"timestamp": "2021-05-07T23:16:20.877954859Z",
"severity": "WARNING",
"labels": {
"dataflow.googleapis.com/region": "us-central1",
"dataflow.googleapis.com/log_type": "system",
"dataflow.googleapis.com/job_id": "2021-05-07_16_07_09-4793338291722263008",
"dataflow.googleapis.com/job_name": "my_job1"
},
"logName": "projects/gold-apparatus-312313/logs/dataflow.googleapis.com%2Fjob-message",
"receiveTimestamp": "2021-05-07T23:16:21.809934742Z"
}
Your error code shows insufficient disk size quota (Required - 1230 Available-818).So hereby its recommended to increase your quota.
For increasing your quota,you can request an increase in quota.
To submit a GCE quota increase :
https://support.google.com/cloud/answer/6075746
Reference:
Quotas & limits for Dataflow - https://cloud.google.com/dataflow/quotas

Dataflow logs from Stackdriver

The resource.labels.region field for the dataflow_step logs in stackdriver, points to global even though the specified regional endpoint is Europe-west2.
Any idea on what is it exactly pointing to?
Once you've supplied GCP Logs Viewer with the desired filtering option, as most simple query based on your inputs seeking for dataflow_step resource type:
resource.type="dataflow_step"
resource.labels.region="europe-west2"
You would probably observe query results retrieved from Cloud Dataflow REST API, consisting with logs entries formatted as a JSON outputs for all Dataflow Jobs that are residing within your GCP project in europe-west2 Regional endpoint:
{
"insertId": "insertId",
"jsonPayload": {
....
"message": "Message content",
....
},
"resource": {
"type": "dataflow_step",
"labels": {
"job_id": "job_id",
"region": "europe-west2",
"job_name": "job_name",
"project_id": "project_id",
"step_id": "step_id"
}
},
"timestamp": "timestamp",
"severity": "severity_level",
"labels": {
"compute.googleapis.com/resource_id": "resource_id",
"dataflow.googleapis.com/job_id": "job_id",
"compute.googleapis.com/resource_type": "resource_type",
"compute.googleapis.com/resource_name": "resource_name",
"dataflow.googleapis.com/region": "europe-west2",
"dataflow.googleapis.com/job_name": "job_name"
},
"logName": "logName",
"receiveTimestamp": "receiveTimestamp"
According to GCP logging service documentation each monitoring resource type derives particular labels from the nested service API, dataflow.googleapis.com corresponds to Dataflow service.
Therefore, if you run Dataflow Job defining the location for job's metadata region, GCP logging service will fetch up this regional endpoint from job description throughout dataflow.googleapis.com REST methods.
The resource.labels.region field on Dataflow Step logs should refer to the regional endpoint that the job is using. "Global" is not an expected value there.

How long AssumeRoleSaml session valid?

I am trying to figure out usage of an AD user, using AWS via AssumeRoleWithSAML, following this ink, https://aws.amazon.com/blogs/security/how-to-easily-identify-your-federated-users-by-using-aws-cloudtrail/.
However, i dont see AssumeRoleWithSAML event at all in my Cloudtrails, though i can clearly see activity from this user. I went all the way to early July in cloudtrail to look up AssumeRoleWithSaml and dont see any event.
Am i missing something? Bcos of this event not coming, i am not able to correlate what this user is doing in AWS.
Thanks
Amit
You are right, there should be an event with name AssumeRoleWithSAML in the CloudTrail logs.
You already referenced the correct AWS security blog post which describes how to "identify a SAML federated user". [1]
Let's go into detail.
The IAM docs [2] contain an example how the AssumeRoleWithSAML event should look like:
{
"eventVersion": "1.05",
"userIdentity": {
"type": "WebIdentityUser",
"principalId": "accounts.google.com:[id-of-application].apps.googleusercontent.com:[id-of-user]",
"userName": "[id of user]",
"identityProvider": "accounts.google.com"
},
"eventTime": "2016-03-23T01:39:51Z",
"eventSource": "sts.amazonaws.com",
"eventName": "AssumeRoleWithWebIdentity",
"awsRegion": "us-east-2",
"sourceIPAddress": "192.0.2.101",
"userAgent": "aws-cli/1.3.23 Python/2.7.6 Linux/2.6.18-164.el5",
"requestParameters": {
"durationSeconds": 3600,
"roleArn": "arn:aws:iam::444455556666:role/FederatedWebIdentityRole",
"roleSessionName": "MyAssignedRoleSessionName"
},
"responseElements": {
"provider": "accounts.google.com",
"subjectFromWebIdentityToken": "[id of user]",
"audience": "[id of application].apps.googleusercontent.com",
"credentials": {
"accessKeyId": "ASIACQRSTUVWRAOEXAMPLE",
"expiration": "Mar 23, 2016 2:39:51 AM",
"sessionToken": "[encoded session token blob]"
},
"assumedRoleUser": {
"assumedRoleId": "AROACQRSTUVWRAOEXAMPLE:MyAssignedRoleSessionName",
"arn": "arn:aws:sts::444455556666:assumed-role/FederatedWebIdentityRole/MyAssignedRoleSessionName"
}
},
"resources": [
{
"ARN": "arn:aws:iam::444455556666:role/FederatedWebIdentityRole",
"accountId": "444455556666",
"type": "AWS::IAM::Role"
}
],
"requestID": "6EXAMPLE-e595-11e5-b2c7-c974fEXAMPLE",
"eventID": "bEXAMPLE-0b30-4246-b28c-e3da3EXAMPLE",
"eventType": "AwsApiCall",
"recipientAccountId": "444455556666"
}
As we can see, the requestParameters contain an element durationSeconds which is the value you are looking for.
Why is the event missing?
First of all, it is necessary to know if you are using the AWS CloudTrail Console or if you are parsing the CloudTrail files which were delivered to the S3 bucket. If you use the CloudTrail console, you are able the view the last 90 days of recorded API activity and events in an AWS Region only!! [3]
So make sure that you use AWS Athena or another solution if you must go further back in time.
You must look into the trail of the correct region! You do this by inspecting the respective S3 prefix for a multi-region trail or by clicking onto the desired region in the top right corner if you use the AWS CloudTrail Console. This is important because regional services are logging to their respective trail!! AWS mentions this as follows:
If you activate AWS STS endpoints in Regions other than the default global endpoint, then you must also turn on CloudTrail logging in those Regions. This is necessary to record any AWS STS API calls that are made in those Regions. For more information, see Turning On CloudTrail in Additional Regions in the AWS CloudTrail User Guide. [4]
Make sure to look into the correct account! You must inspect the trail of the account whose role was assumed. I mention this explicitly because there are multi-account environments which might use centralized identity accounts etc.
References
[1] https://aws.amazon.com/de/blogs/security/how-to-easily-identify-your-federated-users-by-using-aws-cloudtrail/
[2] https://docs.aws.amazon.com/IAM/latest/UserGuide/cloudtrail-integration.html
[3] https://docs.aws.amazon.com/awscloudtrail/latest/userguide/view-cloudtrail-events-console.html
[4] https://docs.aws.amazon.com/IAM/latest/UserGuide/cloudtrail-integration.html

After updating Fargate TaskDefinition, CloudWatch events that trigger tasks fail because of inactive task definitions

I have a series of tasks defined in ECS that run on a recurring schedule. I recently made a minor change to update my task definition in Terraform to change default environment variables for my container (from DEBUG to PRODUCTION):
"environment": [
{"name": "ENVIRONMENT", "value": "PRODUCTION"}
]
I had this task running using the Scheduled Tasks feature of Fargate, setting it at a rate of every 4 hours. However, after updating my task definition, I began to see that the tasks were not being triggered by CloudWatch, since my last container log was from several days ago.
I dug deeper into the issue using CloudTrail, and noticed one particular part of the entry for a RunTask event:
"eventTime": "2018-12-10T17:26:46Z",
"eventSource": "ecs.amazonaws.com",
"eventName": "RunTask",
"awsRegion": "us-east-1",
"sourceIPAddress": "events.amazonaws.com",
"userAgent": "events.amazonaws.com",
"errorCode": "InvalidParameterException",
"errorMessage": "TaskDefinition is inactive",
Further down in the log, I noticed that the task definition ECS was attempting to run was
"taskDefinition": "arn:aws:ecs:us-east-1:XXXXX:task-
definition/important-task-name:2",
However, in my ECS task definitions, the latest version of important-task-name was 3. So it looks like the events are not triggering because I am using an "inactive" version of my task definition.
Is there any way for me to schedule tasks in AWS Fargate without having to manually go through the console and stop/restart/update each cluster's scheduled update? Isn't there any way to simply ask CloudWatch to pull the latest active task definition?
You can use CloudWatch Event Rules to control scheduled tasks and whenever you update a task definition you can also update your rule. Say you have two files:
myRule.json
{
"Name": "run-every-minute",
"ScheduleExpression": "cron(0/1 * * * ? *)",
"State": "ENABLED",
"Description": "a task that will run every minute",
"RoleArn": "arn:aws:iam::${IAM_NUMBER}:role/ecsEventsRole",
"EventBusName": "default"
}
myTargets.json
{
"Rule": "run-every-minute",
"Targets": [
{
"Id": "scheduled-task-example",
"Arn": "arn:aws:ecs:${REGION}:${IAM_NUMBER}:cluster/mycluster",
"RoleArn": "arn:aws:iam::${IAM_NUMBER}:role/ecsEventsRole",
"Input": "{\"containerOverrides\":[{\"name\":\"myTask\",\"environment\":[{\"name\":\"ENVIRONMENT\",\"value\":\"production\"},{\"name\":\"foo\",\"value\":\"bar\"}]}]}",
"EcsParameters": {
"TaskDefinitionArn": "arn:aws:ecs:${REGION}:${IAM_NUMBER}:task-definition/myTaskDefinition",
"TaskCount": 1,
"LaunchType": "FARGATE",
"NetworkConfiguration": {
"awsvpcConfiguration": {
"Subnets": [
"subnet-xyz1",
"subnet-xyz2",
],
"SecurityGroups": [
"sg-xyz"
],
"AssignPublicIp": "ENABLED"
}
},
"PlatformVersion": "LATEST"
}
}
]
}
Now, whenever there's a new revision of myTaskDefinition you may update your rule, e.g.:
aws events put-rule --cli-input-json file://myRule.json --region $REGION
aws events put-targets --cli-input-json file://myTargets.json --region $REGION
echo 'done'
But of course, replace IAM_NUMBER and REGION with your credentials,
Cloud Map seems like a solution for these types of problems.
https://aws.amazon.com/about-aws/whats-new/2018/11/aws-fargate-and-amazon-ecs-now-integrate-with-aws-cloud-map/