How to get a filtered list of AWS EMR clusters? - amazon-web-services

I want to get list of EMR clusters that have a specific tag value.
I looked up the ListClusters API but that does not allow to add custom filters.
How can I apply filter in the API call or implement a two step solution (first get all clusters and then filter them)?

The service API doesn't expose a tags parameter in the request or response so you would need to first call ListClusters and then follow with DescribeCluster for every cluster id to expose the tags. An alternative approach would be to embed any tags or data in the cluster name for the list of clusters to be filterable by name after the first step but this is probably not a suitable approach as tags may change.

Related

Grafana - Get AWS Cost usage by tags

Is there a way to get the costs of AWS resources by tags? It is possible using Linked accounts but I'm trying to figure out if we can filter out costs by tags.
For linked accounts the query is dimension_values(us-east-1, AWS/Billing, EstimatedCharges, LinkedAccount, {"Currency": "USD"})
But i'm not sure what the query is for tags? This is for variable/templating.
This is how a normal graph dashboard filtering looks like.
No, that's not possible. CloudWatch metric EstimatedCharges in the AWS/Billing namespace doesn't provide tag dimension (only ServiceName dimension). AWS Cost Explorer doesn't use CloudWatch metric - there is different AWS API used, which is not implemented in the Grafana.

Change AWS SageMaker LogGroup Prefix?

We have applications for multiple tenants on our AWS account and would like to distinguish between them in different IAM roles. In most places this is already possible by limiting resource access based on naming patterns.
For CloudWatch log groups of SageMaker training jobs however I have not seen a working solution yet. The tenants can choose the job name arbitrarily, and hence the only part of the LogGroup name that is available for pattern matching would be the prefix before the job name. This prefix however seems to be fixed to /aws/sagemaker/TrainingJobs.
Is there a way to change or extend this prefix in order to make such limiting possible? Say, for example /aws/sagemaker/TrainingJobs/<product>-<stage>-<component>/<training-job-name>-... so that a resource limitation like /aws/sagemaker/TrainingJobs/<product>-* becomes possible?
I think it is not possible to change the log streams names for any of the SageMaker services.

How to generate an inventory report of all AWS Services provisioned after a certain date?

I need to generate a report of all AWS Services that were provisioned after a certain date (say last 3 months).
AWS Service Catalog seems relevant here; but can this be used only if the services were provisioned using CloudFormation Templates?
We did our provisioning using Terraform - can AWS Service Catalog still be used to generate an inventory?
If not, is there an alternate way to generate this report?
You can try to use the Resource Groups for that https://eu-central-1.console.aws.amazon.com/resource-groups/home?region=eu-central-1#
There you will find the Tag Editor https://eu-central-1.console.aws.amazon.com/resource-groups/tag-editor/find-resources?region=eu-central-1 and list all of your resources.
If you have tagged your resources, you can filter by them. Alternative solution would be to tag all resources with the current date...wait one day...search again and find resources without the specific date tag. So you will find the differences.
To automate this, you can use e.g. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/resourcegroupstaggingapi.html#client
To get a full solution, you can use Tag Editor, get all resources and request the resources itself with the specific API of each resource, e.g. EC2, Lambda, RDS, etc.
This could be time consuming, so maybe a solution like from aquasec could fit your needs.

Programmatically utilize resources created with CloudFormation

I'm creating a bunch of application resources with AWS CloudFormation, and when the resources are created, CloudFormation adds a hash at the end of the name to make it unique.
i.e. If you wanted to create a Kinesis stream names MyStream, the actually name would be something like my-stack-MyStream-1F8ISNCLP0W4O.
I want to be able to programmatically access the resources without having to know the hash, without having to query AWS for my resources to match the names myself, and without manual steps. Does anybody know a convenient way to use AWS resources in your application programmatically and predictably?
Here are the less ideal options I can think of:
Set a tag on the resource (i.e. name -> MyStream) and query AWS to get the actual resource name.
Query AWS for a list of resources names and look for a partial match on the expected name.
After you create your resources, manually copy the actual names into your config file (probably the sanest of these options)
You can use the CloudFormation API to get a list of resources in your stack. This will give you a list of logical ids (i.e. the name in your CloudFormation template without the hash) and matching physical ids (with the stack name and hash). Using the AWS CLI, this will show a mapping between the two ids:
aws cloudformation describe-stack-resources
--query StackResources[].[LogicalResourceId,PhysicalResourceId]
--stack-name <my-stack>
CloudFormation APIs to do the same query are provided in all the various language SDKs provided by Amazon.
You can use this as an alternative to #1, by querying CloudFormation at runtime, or #3, by querying CloudFormation at buildtime and embedding the results in a config file. I don't see any advantage to using your own tags over simply querying the CF API. #2 will cause problems if you want two or more stacks from the same template to coexist.
I've used both the runtime and build time approaches. The build time approach lets you remove the dependency on or knowledge of CloudFormation, but needs stack specific information in your config file. I like the runtime approach to allow the same build to be deployed to multiple stacks and all it needs is the stack name to find all the related resources.

Can I programmatically find all untagged resources?

The Tag Editor in AWS's web console allows me to search for "All resource types" where a specific tag is not present. For example, I can list everything that is missing the tag "environment".
I'd like to run this as a periodic check, to enforce that no new untagged resources have been created. Some Boto code (running as a Lambda cron job) seems like a good fit. However, the Boto docs only show me how to look at a specific resource type (e.g. EC2 instances).
Is there any API for asking about tags in general? Or do I need to enumerate every resource type?
Just posting here if someone looks for the same question in the future.
AWS Resource Group offers features like this. You can access Resource Group in AWS console through https://console.aws.amazon.com/resource-groups/home.
I didn't find how to use --tag-filters with unTagged value in CLI so used jq to filter out results.
Here is a sample command to get all resources without Environment Tag.
aws resourcegroupstaggingapi get-resources --tags-per-page 100 | jq '.ResourceTagMappingList[] | select(contains({Tags: [{Key: "environment"} ]}) | not)'
Get Resource through resourcegroupstaggingapi reference - https://docs.aws.amazon.com/cli/latest/reference/resourcegroupstaggingapi/get-resources.html
For more information about Resource Group API, Please visit https://docs.aws.amazon.com/resourcegroupstagging/latest/APIReference/API_GetResources.html
You can use AWS Resource Groups from the console, per this write-up, to find resources that have an empty value for a tag. To find resources that have a tag key but no tag value, choose (not tagged).
If you are looking for automated alerting, consider using AWS Config Rules and take a look at this related blog as well. In particular, there is a rule template called "required_tags" that checks for the presence of up to 5 tags. You can run more instances of the rule as needed, or modify the code. Find links that that and other rule templates here.
I also found a nice blog that helps answer the question by using filtering when invoking service APIs via the CLI.
I also found that using AWS Config worked pretty well too. Once AWS Config is setup for a particular AWS Region, you can submit an advanced query to find missing tags, like this one for a missing tag on EC2 resources:
SELECT
resourceId,
resourceType,
configuration.instanceType,
configuration.placement.tenancy,
configuration.imageId,
tags,
availabilityZone
WHERE
resourceType = 'AWS::EC2::Instance'
AND tags.key NOT LIKE 'owner'
There is no API for tags in general. You have to do it for every service type. It is not that difficult. I have a Lambda that gets executed (through S3 PutObject / CloudTrail) which checks for the newly created instances and tags them if needed. It is very easy to extend it other types of AWS services since CloudTrail monitors most of the AWS services. But if you are looking to find all untagged resources, then you have to write a Boto script and query for tags for each service type.