screenshot
I'm getting this error in Control Tower.
I've tried to re-register all OU's, update landing zone but i left AWS CloudTrail disabled because we have a solution to manage CloudTrail trails implemented. Is This issue related to this? Do I have any option to get rid of this error?
Related
We host quite a few things on our GCP project and it's kinda nice to be alerted on new errors, but I wish to send email notifications to Pagerduty only from my production kubernetes cluster.
Is there a way to do this, or should I filter this somehow in pagerduty (unsure if possible - still new with it).
Here is the procedure of sending notifications from kubernetes to pagerduty:
A metric needs to be created based on the requirement and that metric needs to be added when we create an alert. When we proceed further in the notifications page you can select pagerduty and proceed further in creating alerts.
Step1:
Creating an log-based-Metric :
1.In console go to log-based metrics page and there go to create metric and create a new custom metric.
2.Here set the metric type as counter and in details add log metric name as (user/delete).
3.In metric give the query which can fetch the logs of the errors you are expecting to be alerted and create the metric.
Step 2:
Creating an alert policy :
1.In the console go to the Alerting page and there go to create policy and create a new alerting policy.
2.Go to add condition and in that resource type is the resource which we need to be triggered (in our case kubernetes pod) and metric is the metric we created in step1.
3.In filer add the project id and in period add the suitable period. Next in the configuration add these details and proceed to the next steps leaving the rest fields as default.
Step3:
1.Next you will be directed to select notification channels there you go to Manage notification channels and select pager duty services and add new and there add the display name and the service key and check connectivity save and proceed further.
2.Add the alert name and save the alert.
I am new to using Opscentre. I got this Alarm saying
"Issue detected for EC2 infrastructure"
Description : "We are investigating increased API error rates in the us-east-1 Region"
I Don't have any idea of what is it and not able to get a clear description of what this issue about.
Can anyone help me out.
This will just be AWS events alerting you that there is an ongoing incident involved in that region.
You can view more information in your personal health dashboard.
There are many documents that explain how to resolve this error. Checked many of them and tried . However following them is not resolving this issue for me.
Error I get is
An error occurred (ValidationError) when calling the PutLifecycleHook operation: Unable to publish test message to notification target arn:aws:sqs:xxxxx:XXXXX:kubeeventsqueue.fifo using IAM role arn:aws:iam::XXXXXXXXX:role/kubeautoscaling. Please check your target and role configuration and try to put lifecycle hook again.
The command I am using is:
aws autoscaling put-lifecycle-hook --lifecycle-hook-name terminate --auto-scaling-group-name mygroupname --lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING --role-arn arn:aws:iam::XXXXXX:role/kubeautoscaling --notification-target-arn arn:aws:sqs:xxxxx:XXXXXXX:kubeeventsqueue.fifo
Note that I have replaced XXXXX for the actual ids above.
The role concerned (arn:aws:iam::XXXXXX:role/kubeautoscaling) is having trust relationship with autoscaling.amazonaws.com. It is also having "AutoScalingNotificationAccessRole" policy attached to it.
While testing, I have also tried adding a permission of "Allow everybody" for All SQS Actions (SQS:*). (Removed it after testing though).
I have also tried to first create SQS queue and then configure --notification-target-arn, without any success.
Any help on this would be very helpful.
It appears that you are using an Amazon SQS FIFO (first-in-first-out) queue.
From Configuring Notifications for Amazon EC2 Auto Scaling Lifecycle Hooks - Receive Notification Using Amazon SQS:
FIFO queues are not compatible with lifecycle hooks.
I don't know whether this is the cause of your current error, but it would prohibit your desired configuration from working.
Yes, FIFO queues are definitely not supported by LifeCycleHooks. I Wasted a lot of time dorking with permissions and queue configuration only to finally find that FIFO is not supported. It would be nice if this were much more prominent in the documentation because 1) it's way not obvious or intuitive and 2) the error message received suggests it's permissions or something. How about explicitly stating "FIFO Queues are not supported" instead of "Failed to send test message..." RIDICULOUS!
Trying to put cloud watch logs into kineses firehose.
Followed below:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#FirehoseExample
Got this error
An error occurred (InvalidParameterException) when calling the PutSubscriptionFilter operation: Could not deliver test message to specified Firehose stream. Check if t
e given Firehose stream is in ACTIVE state.
aws logs put-subscription-filter --log-group-name "xxxx" --filter-name "xxx" --filter-pattern "{$.httpMethod = GET}" --destination-arn "arn:aws:firehose:us-east-1:12345567:deliverystream/xxxxx" --role-arn "arn:aws:iam::12344566:role/xxxxx"
You need to update the trust policy of your IAM role so that it gives permissions to the logs.amazonaws.com service principal to assume it, otherwise CloudWatch Logs won't be able to assume your role to publish events to your Kinesis stream. (Obviously you also need to double-check the permissions on your role to make sure it has permissions to read from your Log Group and write to your Kinesis Stream.)
It would be nice if they added this to the error message to help point people in the right direction...
The most likely problem that causes this error is a permissions issue. i.e. something wrong in the definition of the IAM role you passed to --role-arn. You may want to double check that the role and its permissions were set up properly as described in the doc.
I was getting a similar error when subscribing to a cloudwatch loggroup and publishing to a Kinesis stream. Cdk was not defining a dependency needed for the SubscriptionFilter to be created after the Policy that would allow the filtered events to be published in Kinesis. This is reported in this github cdk issue:
https://github.com/aws/aws-cdk/issues/21827
I ended up using the workaround implemented by github user AlexStasko: https://github.com/AlexStasko/aws-subscription-filter-issue/blob/main/lib/app-stack.ts
If your Firehose is active status and you can send log stream then the remaining issue is only policy.
I got the similar issue when follow the tutorial. The one confused here is Kinesis part and Firehose part, we may mixed up together. You need to recheck your: ~/PermissionsForCWL.json, with details part of:
....
"Action":["firehose:*"], *// You could confused with kinesis:* like me*
"Resource":["arn:aws:firehose:region:123456789012:*"]
....
When I did the tutorial you mentioned, it was defaulting to a different region so I had to pass --region with my region. It wasn't until I did the entire steps with the correct region that it worked.
For me I think this issue was occurring due to the time it takes for the IAM data plane to settle after new roles are created via regional IAM endpoints for regions that are geographically far away from us-east-1.
I have a custom Lambda CF resource that auto-subscribes all existing and future log groups to a Firehose via a subscription filter. The IAM role gets deployed for CW Logs then very quickly the Lambda function tries to subscribe the log groups. And on occasion this error would happen.
I added a time.sleep(30) to my code (this code only runs once a stack creation so it's not going to hurt anything to wait 30 seconds).
We have a team shared AWS account, that sometimes things are hard to debug. Especially, for EMR APIs, throttling happens regularly, that it'll be nice to have CloudTrail logs tell people who is not being nice when using EMR. I think our CloudTrail logging is enabled, that I can see these API events with EMR as event source--
AddJobFlowSteps
RunJobFlow
TerminateJobFlows
I'm pretty sure that I'm calling DescribeCluster for plenty times and caused some throttling, but not sure why they are not showing up in my CloudTrail logs...
Can someone help understand --
Is there additional setting needed for DescribeCluster EMR API, in order to log events to CloudTrail?
And what about other EMR APIs? Can they be configured to log events to CloudTrails, without using SDK explicitly writing to CloudTrails?
I have read these articles, feels like much can be done in CloudTrails...
https://docs.aws.amazon.com/emr/latest/ManagementGuide/logging_emr_api_calls.html
https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-management-and-data-events-with-cloudtrail.html#logging-management-events-with-the-cloudtrail-console.
https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-supported-services.html
Appreciate any help!
A quick summary of AWS cloudtrail:
The events recorded by AWS cloudtrail are of two types: Management events and Data events.
Management events include actions like: stopping an instance, deleting a bucket etc.
Data events are only available for two services (S3 and lambda), which include actions like: object 'abc.txt' was read from the S3 bucket.
Under management events, we again have 4 types:
Write-only
Read-only
All (both reads and writes)
None
The DescribeCluster event that you are looking for comes under the management event 'Read-only' type. DescribeCluster - cloudtrail image:
Please ensure that you have selected "All" or "ReadOnly" management event type in your cloudtrail trail.
Selecting "WriteOnly" in management event type in your cloudtrail trail will not record 'DescribeCluster'.
There is no other AWS service specific setting that you can enable in cloudtrail.
Also note that the 'Event history' tab in AWS Cloudtrail console records all types of logs (including ReadOnly) for a period of 90 days. You can see the DescribeCluster event there too.