boto3/aws: resource vs session - amazon-web-services

I can use resource like this way
s3_resource = boto3.resource('s3')
s3_bucket = s3_resource.Bucket(bucket)
Also I can use session like this way.
session = boto3.session.Session()
s3_session = session.resource("s3", endpoint_url=self.endpoint_url)
s3_obj = s3_session.Object(self.bucket, key)
Internally, does session.resource("s3" uses boto3.resource('s3')?

Normally, people ask about boto3 client vs resource.
Calls using client are direct API calls to AWS, while resource is a higher-level Pythonic way of accessing the same information.
In your examples, you are using session, which is merely a way of caching credentials. The session can then be used for either client or resource.
For example, when calling the AWS STS assume_role() command, a set of temporary credentials is returned. These can be stored in a session and API calls can be made using these credentials.
There is effectively no difference between your code samples, since no specific information has been stored in the session object. If you have nothing to specifically configure in the session, then you can skip it entirely.

Related

How can I reference mounted secrets from Secret Manager in a python Cloud Function?

I'm trying to reference a series of APIs and would like peace of mind for key security, so I am storing keys in Secret Manager. However, the documentation doesn't specify the best method of connecting to a mounted path within the Cloud Function.
Suppose my secret was named key6 and has a mount path of /api/secret/key6 - How would I call this in python?
I attempted this method: https://cloud.google.com/secret-manager/docs/creating-and-accessing-secrets#secretmanager-create-secret-python
However, given that this didn't use the mounted path, I wanted to see if there was a better implementation.
The process to read the secret is via standard file operations in Python. So if the path is /api/secret/key6 , then you could do something like:
secret_location = '/api/secret/key6'
with open(secret_location) as f:
YOUR_SECRET = f.readlines()[0]
Just ensure that you have given the service account running your Cloud Functions, the necessary permissions to access the Secrets.

How can I insure that my retrieval of secrets is secure?

Currently I am using Terraform and Aws Secrets Manager to store and retrieve secrets, and I would like to have some insight if my implementation is secure, and if not how can I make it more secure. Let me illustrate with what I have tried.
In secrets.tf I create a secret like (this needs to be implemented with targeting):
resource "aws_secretsmanager_secret" "secrets_of_life" {
name = "top-secret"
}
I then go to the console and manually set the secret in AWS Secrets manager.
I then retrieve the secrets in secrets.tf like:
data "aws_secretsmanager_secret_version" "secrets_of_life_version" {
secret_id = aws_secretsmanager_secret.secrets_of_life.id
}
locals {
creds = jsondecode(data.aws_secretsmanager_secret_version.secrets_of_life.secret_string)
}
And then I proceed to use the secret (export them as K8s secrets for example) like:
resource "kubernetes_secret" "secret_credentials" {
metadata {
name = "kubernetes_secret"
namespace = kubernetes_namespace.some_namespace.id
}
data = {
top_secret = local.creds["SECRET_OF_LIFE"]
}
type = "kubernetes.io/generic"
}
It's worth mentioning that I store tf state remotely. Is my implementation secure? If not, how can I make it more secure?
yes I can confirm it is secure since you accomplished the following:
plain text secrets out of your code.
Your secrets are stored in a dedicated secret store that enforces encryption and strict access control.
Everything is defined in the code itself. There are no extra manual steps or wrapper scripts required.
Secret manager support rotating secrets, which is useful in case a secret got compromised.
The only thing I can wonder about is using a Terraform backend that supports encryption like s3, and avoid commet the state file to your source control.
Looks good, as #asri suggests it a good secure implementation.
The risk of exposure will be in the remote state. It is possible that the secret will be stored there in plain text. Assuming you are using S3, make sure that the bucket is encrypted. If you share tf state access with other developers, they may have access to those values in the remote state file.
From https://blog.gruntwork.io/a-comprehensive-guide-to-managing-secrets-in-your-terraform-code-1d586955ace1
These secrets will still end up in terraform.tfstate in plain text! This has been an open issue for more than 6 years now, with no clear plans for a first-class solution. There are some workarounds out there that can scrub secrets from your state files, but these are brittle and likely to break with each new Terraform release, so I don’t recommend them.
Hi I'm working on similar things, here're some thoughts:
when running Terraform for the second time, the secret will be in plain text in state files which are stored in S3, is S3 safe enough to store those sensitive strings?
My work is using the similar approach: run terraform create an empty secret / dummy strings as placeholder -> manually update to real credentials -> run Terraform again to tell the resource to use the updated credentials. The thing is that when we deploy in production, we want the process as automate as possible, this approach is not ideal ut I haven't figure out a better way.
If anyone has better ideas please feel free to leave a comment below.

When to use boto3 sessions explicitly

By default boto3 creates sessions whenever required, according to the documentation
it is possible and recommended to maintain your own session(s) in some
scenarios
My understanding is if I use a session created by me I can reuse the same session across the application instead of boto3 automatically creating multiple sessions or if I want to pass credentials from code.
Has anyone ever maintained sessions on their own? If yes what was the advantage that it provided apart from the one mentioned above.
secrets_manager = boto3.client('secretsmanager')
session = boto3.session.Session()
secrets_manager = session.client('secretsmanager')
Is there any advantage of using one over the other and which one is recommended in this case.
References: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/session.html
I have seen the second method used when you wish to provide specific credentials without using the standard Credentials Provider Chain.
For example, when assuming a role, you can use the new temporary to create a session, then create a client from the session.
From boto3 sessions and aws_session_token management:
import boto3
role_info = {
'RoleArn': 'arn:aws:iam::<AWS_ACCOUNT_NUMBER>:role/<AWS_ROLE_NAME>',
'RoleSessionName': '<SOME_SESSION_NAME>'
}
client = boto3.client('sts')
credentials = client.assume_role(**role_info)
session = boto3.session.Session(
aws_access_key_id=credentials['Credentials']['AccessKeyId'],
aws_secret_access_key=credentials['Credentials']['SecretAccessKey'],
aws_session_token=credentials['Credentials']['SessionToken']
)
You could then use: s3 = session.client('s3')
Here is an example where I needed to use the session object with both Boto3 and AWS Datawrangler to set the region for both:
REGION = os.environ.get("REGION")
session = boto3.Session(region_name=REGION)
client_rds = session.client("rds-data")
df = wr.s3.read_parquet(path=path, boto3_session=session)

When to use a boto3 client and when to use a boto3 resource?

I am trying to understand when I should use a Resource and when I should use a Client.
The definitions provided in boto3 docs don't really make it clear when it is preferable to use one or the other.
boto3.resource is a high-level services class wrap around boto3.client.
It is meant to attach connected resources under where you can later use other resources without specifying the original resource-id.
import boto3
s3 = boto3.resource("s3")
bucket = s3.Bucket('mybucket')
# now bucket is "attached" the S3 bucket name "mybucket"
print(bucket)
# s3.Bucket(name='mybucket')
print(dir(bucket))
#show you all class method action you may perform
OTH, boto3.client are low level, you don't have an "entry-class object", thus you must explicitly specify the exact resources it connects to for every action you perform.
It depends on individual needs. However, boto3.resource doesn't wrap all the boto3.client functionality, so sometime you need to call boto3.client , or use boto3.resource.meta.client to get the job done.
If possible use client over resource, especially if dealing with s3 object lists, and then trying to get basic information on those objects themselves.
Client calls s3 10,000/1000 = 10 times and gives you a lot of information on each object in each call..
Resource, I assume calls s3 10,000 times(or maybe same as client??), but if you take that object and try to do something with it, that is probably another call to s3, making this about 20x slower than client.
my Test reveals the following results.
s3 = boto3.resource("s3")
s3bucket = s3.Bucket(myBucket)
s3obj_list = s3bucket.objects.filter(Prefix=key_prefix)
tmp_list = [s3obj.key for s3obj in s3obj_list]
(tmp_list = [s3obj for s3obj in s3obj_list] gives same ~9min results)
When trying to get a list of 150,000 files, took ~9 minutes. If s3obj_list is indeed pulling 1000 files a call and buffering it, s3obj.key is probably not part of it and makes another call.
...some sort of loop, that also sets ContinuationToken...
response = client.list_objects_v2(
Bucket = bucket,
Prefix = prefix,
ContinuationToken=response["NextContinuationToken"],
)
...
Client took ~30 seconds to list the 150,000 files.
I don't know if resource buffers 1000 files at a time but if it doesn't that is a problem.
I also don't know if it is possible for resource to buffer the information attached to the object, but that is another problem.
I also don't know if using pagination could make client faster/easier to use.
Anyone who knows the answer to the 3 questions above please do. I'd be very interested to know.

trouble with AWS SWF using IAM roles

I've noticed on AWS that if I get IAM role credentials (key, secret, token) and set them as appropriate environment variables in a python script, I am able to create and use SWF Layer1 objects just fine. However, it looks like the Layer2 objects do not work. For example, if I have boto and os imported, and do:
test = boto.swf.layer2.ActivityWorker()
test.domain = 'someDomain'
test.task_list = 'someTaskList'
test.poll()
I get an exception that the security token is not valid, and indeed, if I dig through the object, the security token is not set. This even happens with:
test = boto.swf.layer2.ActivityWorker(session_token=os.environ.get('AWS_SECURITY_TOKEN'))
I can fix this by doing:
test._swf.provider.security_token = os.environ.get('AWS_SECURITY_TOKEN')
test.poll()
but seems pretty hacky and annoying because I have to do this every time I make a new layer2 object. Anyone else noticed this? Is this behavior intended for some reason, or am I missing something here?
Manual management of temporary security credentials is not only "pretty hacky", but also less secure. A better alternative would be to assign an IAM role to the instances, so they will automatically have all permissions of that Role without requiring explicit credentials.
See: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html