We have multiple users that submit queries to AWS Athena concurrently. Is there anyway Athena CLI that allows us to find out the submitter, given a query execution id?
Based on the AWS Athena doc, it seems that this is not supported.
https://docs.aws.amazon.com/cli/latest/reference/athena/
Use Cloudtrail service instead.
From User guide:
Using CloudTrail, you can determine the request that was made to Athena, the
IP address from which the request was made, who made the request, when
it was made, and additional details.
More details here: https://docs.amazonaws.cn/en_us/athena/latest/ug/monitor-with-cloudtrail.html
Related
Is there an API/way to programmatically query AWS documentation for a specific service? For instance, I want to know the encryption algorithm used by a service for protecting data at rest. Can I write a script that will automatically query AWS documentation for that service and give me this information?
There is no API for AWS Documentation.
However, the AWS CLI is open-source and it has data files that detail all API calls and their parameters.
It would not, however, contain the encryption algorithms. That is internal to Amazon S3 and is not shared publicly.
Is it possible to directly access AWS Glue Data Catalog of Account B via the Athena interface of Account A?
I was just trying to resolve this same issue in my own setup, but then stumbled across this bummer (the last bullet under Cross-Account Access Limitations on this page):
Cross-account access to the Data Catalog is not supported when using an AWS Glue crawler, Amazon Athena, or Amazon Redshift.
So it sounds like even with the cross-account access that is possible today, they won't naturally replicate through those services (including the asked about Athena).
That said, I was able to set up cross-account access to the AWS Glue Data Catalog in a way that allowed me to use Account A to pull all relevant info about Data Catalog objects from Account B. I can update my answer to incorporate how far I got, if you want, but a hacky method that might solve this question would be to set up the cross-account access that is possible today then run a recurring Lambda function that replicates over all the relevant metadata in the Data Catalog from Account B to Account A so users in Account A can view that within Account A's AWS Glue Data Catalog. I'm not sure whether Athena specifically would work in that setup, as I know it requires PutObject access when it queries data in S3 (which could be solved via the appropriate S3 bucket policies, but that'd be another cross-account permissions thing to manage).
Let me know whether you'd like to see those details on what cross-account stuff I was able to get working.
AWS has started supporting this using Lambda, please follow below link
https://aws.amazon.com/blogs/big-data/cross-account-aws-glue-data-catalog-access-with-amazon-athena/
Since May 2021 it is now possible to register a data catalog from a different account in Amazon Athena, see the User Guide.
Athena Query Engine v2 is required though and there are some other limitations.
Can you get the AWS usage report for subdirectory for buckets? I want to know the amount of traffic of all 'GetObject' requests for all subdirectory of S3.
First, remember that there are no "subdirectories" in S3. Everything within a bucket is in a flat index and identified by an object key. However, in the AWS console, objects that contain a shared prefix are represented together in a "folder" named after the shared prefix.
With that in mind, it should be easier to understand why you cannot get an AWS usage report for a specific "subdirectory". The AWS usage report is meant to be an overview of your AWS services and is not meant to be used for more detailed analytics.
Instead there is another AWS service that allows you insight into more detailed analytics for your other AWS services: AWS CloudWatch. With AWS Cloudwatch you can:
Set up daily storage
metrics
Set up request (GET) metrics on a
bucket
And, for your specific case, you can set up request metrics for specific prefixes (subdirectories) within a bucket.
Using request metrics from AWS CloudWatch is a paid service (and another reason why you cannot get detailed request metrics in the AWS usage report).
I'm starting with Alexa development and AWS in general. I've subscribed for the free tier, created my skill, set a AWS Lambda function and done some little testing. I got nothing more running on AWS.
What I've noticed that except for AWS Lambda and Cloudwatch usage I got requests to AWS Key Management Service on my Billing Dashboard. I'm not using any environment variables as this was one of the reasons for KMS requests suggested by Google.
From my billing management report I got 3 times more KMS requests than to my Lambda (30 vs 9). I know this is small number but KMS got 20k requests in the free tier and Lambda got 1000000 and I just don't understand how this connects to each other.
Is AWS KMS required for Lambda operation? What is it used for?
Many AWS services are using KMS to manage keys and access to keys while keeping them under your control.
The full list is documented here https://docs.aws.amazon.com/kms/latest/developerguide/service-integration.html
Pricing of KMS is per keys that you create and manage. https://aws.amazon.com/kms/pricing/
Keys automatically created by AWS Services are for free.
I just checked my bill and I am not charged for KMS at all.
I do suggest you to enable CloudTrail logs on your account to understand where the KMS calls you're seeing are originated from.
To query Cloudtrail logs, you can make a simple SQL query on Athena.
Doc to setup Athena for Cloudtrail : https://docs.aws.amazon.com/athena/latest/ug/cloudtrail-logs.html
SQL Query to analyze kms calls :
SELECT eventtime,
useridentity.type,
eventsource,
eventname,
sourceipaddress,
eventtime
FROM "default"."cloudtrail_logs_logs_sst_cloudtrail"
WHERE eventsource = 'kms.amazonaws.com' AND
eventtime BETWEEN '2018-07-01' AND '2018-07-31' ;
I'm using Terraform to provision some resources on AWS. Running the "plan" step of Terraform fails with the following vague error (for example):
Error: Error loading state: AccessDenied: Access Denied
status code: 403, request id: ABCDEF12345678, host id: SOMELONGBASE64LOOKINGSTRING===
Given a request id and a host id is it possible to see more in depth what went wrong?
Setting TF_LOG=DEBUG (or some other level) seems to help, but I was curious if there is a CLI command to get more information from CloudTrail or something.
Thanks!
Terraform won't have any privileged information about the access denial, but AWS does. Because you mentioned S3 was the problem I based my answer on finding the S3 request id. You have a couple options to find the request given a request id in AWS.
Create a trail in AWS CloudTrail. CloudTrail will log the API calls (including request id) at the bucket level by default. If the request was for a specific object, you need to enable S3 data events when you create the trail.
Turn on S3 server access logs.
You can manually search for the request id in the log files in S3 or use Athena. For CloudTrail, you can also configure CloudWatch Logs and search within the Log Group that gets created via the search bar.
CloudTrail records API calls from all services, not just S3. It could be a useful tool for diagnosing issues besides those related to S3. Note that there can be an up to 15-minute delay for logs to appear in CloudTrail.