AWS kibana/ES trying to create policy but getting "authorization exception" - amazon-web-services

I created an AWS ES cluster via terraform, VPC version.
It got me a kibana instance which I can access through a URL.
I access it via a proxy as it is in a VPC and thus not publicly accessible.
All good. But recently I ran out of disk. The infamous Write Status was in red, and nothing was being written into the cluster anymore.
As this is a dev environment. I googled and found the easiest possible to fix this:
curl -XDELETE <URL>/*
So far so good, logs are being written again.
But I now thought I need to fix this. So I did some more reading and was wanting to create a Index State Management Policy. I just took the default one and just changed the notification destination.
But when hitting "Create Policy" I get:
Sorry, there was an error
Authorization Exception
Which is quite odd as AWS just created a kibana instance with no user management whatsoever - so I would assume to have all rights.
Any idea?

Indeed we had to ask support and the reason it was failing was that - as this is a dev environment and not production - we had no master nodes and also no UltraWarm storage. The sample strategy I was trying to install moves from hot to warm - which apparently actually means UltraWarm, and thus needs that UltraWarm storage enabled.
A bit of an inappropriate error message though.

Related

I can't find and disable AWS resources

My free AWS tier is going to expire in 8 days. I removed every EC2 resource and elastic IP associated with it. Because that is what I recall initializing and experimenting with. I deleted all the roles I created because as I understand it, roles permit AWS to perform actions for AWS services. And yet, when I go to the billing page it shows I have these three services that are in current usage.
[1]: https://i.stack.imgur.com/RvKZc.png
I used the script as recommended by AWS documentation to check for all instances and it shows "no resources found".
Link for script: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-listec2resources.html
I tried searching for each service using the dashboard and didn't get anywhere. I found an S3 bucket, I don't remember creating it but I deleted it anyway, and still, I get the same output.
Any help is much appreciated.
ok, I was able to get in touch with AWS support via Live chat, and they informed me that those services in my billing were usages generated before the services were terminated. AWS support was much faster than I expected.

Why am I getting 403s when trying to access an aurora serverless database from a lambda but not from the query editor?

I've spun up an aurora serverless posgres-compatible database and I'm trying to connect to it from a lambda function, but I am getting AccessDenied errors:
AccessDeniedException:
Status code: 403, request id: 2b19fa38-af7d-4f4a-aaa5-7d068e92c901
Details:
I can connect to and query the database manually via the query editor if I use the same secret-arn and database name that the lambda is trying to use. I've triple-checked that the arns are correct
My lambdas are not in the vpc but are using the data api. The RDS cluster is in the default vpc
I've temporarily given my lambdas administrator access so that I know it's not a policy-based issue on the lambda side of things
Cloudwatch does not contain any additional details on the error
I am able to query the database from the command line of my personal computer (not on the vpc)
Any suggestions? Perhaps there is a way to get better details out of the error?
Aha! After trying to connect via the command line and being able to do so successfully, I realized this had to be something non-network related. Digging into my code a bit I eventually realized there wasn't anything wrong with the connection portions of the code, but rather with the user permissions being used to create the session/service that attempted to access the data. In hindsight I suppose the explicit AccessDenied (instead of a timeout) should have been a clue that I was able to reach the database just not able to do anything with it.
After digging in I discovered these two things are very different:
AmazonRDSFullAccess
AmazonRDSDataFullAccess
If you want to use the data api, you have to have the AmazonRDSDataFullAccess (or similar) policy. AmazonRDSFullAccess is not a superset of the AmazonRDSDataFullAccess permissions as one might assume. (If you look at the json for the AmazonRDSFullAccess policy you'll notice the permissions cover rds:* while the other policy covers rds-data:*, so apparently these are just different permissions spaces entirely)
TLDR: Use the AmazonRDSDataFullAccess policy (or similar) to access the data api. AmazonRDSFullAccess will not work.
I think you need to put your lambda in the same VPC as your serverless db. I did a quick test and able to connect to it from an EC2 in the same VPC.
ubuntu#ip-172-31-5-146:~$ telnet database-11.cluster-ckuv4ugsg77i.ap-northeast-1.rds.amazonaws.com 5432
Trying 172.31.14.180...
Connected to vpce-0403cfe830963dfe9-u0hmgbbx.vpce-svc-0445a873575e0c4b1.ap-northeast-1.vpce.amazonaws.com.
Escape character is '^]'.
^CConnection closed by foreign host.
This is my security group.

Permission failure when pulling from gcr.io

I have 2 VMs running on Google Compute Engine. They are identical except for the fact that they are running under different service accounts.
Both of those service accounts have (as far as I can tell) identical permissions on the buckets used by gcr.io
The init script that runs when the VM starts up pulls a docker container from gcr.io, on the VM running as data-dev-dp#project-id.iam.gserviceaccount.com the pull succeeds:
Unable to find image 'gcr.io/project-id/gdp/jupyterlab-py2-spark-notebook:1.9' locally
1.9: Pulling from project-id/gdp/jupyterlab-py2-spark-notebook
bc51dd8edc1b: Pulling fs layer
b56e3f6802e3: Pulling fs layer
on the VM running as data-dev-cmp#project-id.iam.gserviceaccount.com the pull fails:
Unable to find image 'gcr.io/project-id/gdp/jupyterlab-py2-spark-notebook:1.9' locally
/usr/bin/docker: Error response from daemon: pull access denied for gcr.io/project-id/gdp/jupyterlab-py2-spark-notebook, repository does not exist or may require 'docker login': denied: Permission denied for "1.9" from request "/v2/project-id/gdp/jupyterlab-py2-spark-notebook/manifests/1.9"
I was under the impression that having identical permissions on the bucket should be sufficient hence I'm wondering what other permissions are required in order to make this work. Could anyone suggest something?
UPDATE. I used toolbox (https://cloud.google.com/container-optimized-os/docs/how-to/toolbox) to verify that the permissions on the bucket are not the same for those two accounts:
# gsutil ls gs://artifacts.project-id.appspot.com
gs://artifacts.project-id.appspot.com/containers/
# gsutil ls gs://artifacts.project-id.appspot.com
AccessDeniedException: 403 data-dev-cmp#project-id.iam.gserviceaccount.com does not have storage.objects.list access to artifacts.project-id.appspot.com.
Clearly that's the cause of the issue, though I find it very strange that my screenshots above from the GCP Console suggest different. I am continuing to investigate.
This turned out to be a problem that is all too familiar to us because we are constantly creating infrastructure, tearing it down, and standing it up again. When doing so, particularly when those operations don’t occur cleanly (as was the case today), we can find ourselves in a position whereby roles are assigned to an old instance of a service account. The console will tell you that the account has roles assigned to it but that’s actually not the case. We encounter this problem often.
The solution on this occasion was to tear down all the infrastructure cleanly then recreate it again, including the service account that was exhibiting the problem.

Terraform destroy to get error response from AWS API?

AWS won't let you delete a VPC if there are instances in it.
If I create a non TF-managed instance in a VPC (that I did create with terraform) and then do a terraform destroy TF hangs waiting.
I can then go to AWS console and manually delete the VPC and get a useful response from AWS as to why it cant be deleted and a list of the offending resources I can manually delete.
Is there a verbose switch where Terraform would spit out these messages from the AWS API? I assume the AWS API returns this info, but perhaps it only does that when deleting via the console?
I haven't found any info on how to make the TF destroy command return this info so assuming it's probably not possible but wanted to confirm.
You can get more information from terraform by setting the TF_LOG variable before executing terraform. There are a few levels of logging, which should look familiar if you are familiar with syslog severity levels (i.e. INFO, WARN, ERROR ,etc..). Setting this variable is a very useful debugging strategy.
Setting TF_LOG=DEBUG should at least let you determine which AWS api calls are being called. In my experience with terraform, it's not uncommon for an api call to fail; and terraform sometimes won't report an error, hangs, or does report an error but the information is archaic at best. This is something the terraform community is working on. And there are current github issues open to similar behavior
If after setting the TF_LOG environment variable, the api call is indeed failing, I suggest that you open a github issue with terraform; and please format it using the issues contributing guidelines

The service role arn:aws:iam::20011470201:role/deploy doesn't have permission to perform the following operation: autoscaling:DescribeLifecycleHooks

Has anyone come across the below error before?
The service role arn:aws:iam::20011470201:role/deploy doesn't have permission to perform the following operation: autoscaling:DescribeLifecycleHooks
I have code-deploy set-up between by bit-bucket account and my Amazon AWS instance.
I am able to deploy to the test server everyday without issue.
But when i try to add the instance of our production server to the list of instances, i get the above error
Note: I have added this instance and successfully deployed the code in the past , i'm not sure why i get this error now.
Any directions/hints on how to solve this would be appreciated.
Not sure how i missed it, but the policy i had defined was missing the "autoscaling:DescribeLifecycleHooks", once i added this to the existing permission everything worked fine.
Then again, the policy has not changed in well over a year, not sure why aws did not complain about this earlier
We currently rolled out a fix for permission issues between CodeDeploy and AutoScaling. Previously CodeDeploy doesn't require autoscaling:DescribeLifecycleHooks to describe or create a lifecycle hook to AutoScaling, when customer's deplyoment group contains AutoScaling groups. But now we started to require this permission, which is actually the right way and also expected. Adding the proper permission fixes the problem.
Thanks,
Binbin
I see that you fixed this. Can you paste an example config here so noobs like me know just how to place this bit of code? Oh, and I can't comment on your accepted solution yet, not enough points...