S3 has recently announced "bucket_key_enabled" option to cache the kms key used to encrypt the bucket contents so that the number of calls to the kms server is reduced.
https://docs.aws.amazon.com/AmazonS3/latest/dev/bucket-key.html
So if that the bucket is configured with
server side encryption enabled by default
use a kms key "key/arn1" for the above
by selecting "enable bucket key", we are caching "key/arn1" so that each object in this bucket does not require a call to kms server (perhaps internally it has time-to-live etc but the crust is that, this key is cached and thus kms limit errors can be avoided)
Given all that, what is the point of overriding kms key at object level and still having this "bucket_key_enabled" set?
Eg :
bucket/ -> kms1 & bucket_key_enabled
bucket/prefix1 -> kms2 & bucket_key_enabled
Does s3 actually cache the object-key to kms-key map?
To give you the context, I currently have the application which publishes data to the following structure
bucket/user1
bucket/user2
While publishing to these buckets, it explicitly passed kms key assigned per user for each object upload.
bucket/user1/obj1 with kms-user-1
bucket/user1/obj2 with kms-user-1
bucket/user1/obj3 with kms-user-1
bucket/user2/obj1 with kms-user-2
bucket/user2/obj2 with kms-user-2
bucket/user2/obj3 with kms-user-2
if s3 is smart enough to reduce this to the following map,
bucket/user1 - kms-user-1
bucket/user2 - kms-user-2
All I have to do is, upgrade the sdk library to latest version and add a withBucketKeyEnabled(true) to the putObjectRequest in the s3Client wrapper we have.
Let me know how it works internally so that we can make use of this feature wisely.
I finally went with upgrading the sdk to latest version and passing withBucketKeyEnabled(true) to putObject API calls.
I was able to prove with cloud trail that the number of calls to kms server is the same regardless of encryption and bucketKeyEnabled set at bucket level or at "each" object level.
kms-key and bucketKeyEnabled=true at bucket level. No encryption option is passed at putObject() call
Calls made to GenerateDataKey() = 10
Calls made to Decrypt() = 60
No encryption settings at s3 bucket. For each putObject() call, I am passing kms-key and bucketKeyEnabled=true.
PutObjectRequest(bucketName, key, inputStream, objectMetadata)
.withSSEAwsKeyManagementParams(SSEAwsKeyManagementParams(keyArn))
.withBucketKeyEnabled<PutObjectRequest>(true)
Calls made to GenerateDataKey() = 10
Calls made to Decrypt() = 60
With this option disabled like below,
PutObjectRequest(bucketName, key, inputStream, objectMetadata)
.withSSEAwsKeyManagementParams(SSEAwsKeyManagementParams(keyArn))
Calls made to GenerateDataKey() = 10011
Calls made to Decrypt() = 10002
Thus I was able to conclude that bucketKeyEnabled works regardless of whether you set at the bucket level or object level. Although, I do not know how it is optimized for both access patterns internally
Related
AWS SecretManager Read and Write concurrency.
If I am writing new secret value to a secret and if at the same time read is performed,
does the read call waits for he write calls to complete?
Or read will retrieve some invalid or intermediate value of the keys stored inside the secret?
By default, the GetSecretValue API returns the version of the secret that has the AWSCURRENT stage. It also allows you to fetch older versions of a secret by specifying the VersionId. Versions are also immutable and if you call PutSecretValue you create a new version.
You won't get partial versions here - the label AWSCURRENT will only be switched to the new version once the update is complete. Everything else would result in a terrible user experience.
We have RSA key pairs generated on on-prem and plan to sync them to GCP-KMS. There is an yearly key rotation policy which would be done on on-prem and new key_versions would be synced to KMS. My concern is with the KMS API.
Problem: The API always asks for the 'key_version' as an argument to encrypt/decrypt a file.
Desired behaviour: During decryption, is it not possible that the KMS sees the certificate thumbprint and returns the appropriate key version to decrypt a given encrypted file? e.g. a DEK wrapped with the RSA_public when supplied to KMS gets decrypted by the RSA_Private(or KEK) of the correct version.
If yes, is there any documentation that elaborates on this use case?
According to the documentation, you can achieve that with symmetric signature (no key version specified), but you can't with the asymetricDecrypt (key version is required in the URL path of the API)
In short:
Snowflake provides encryption option when we are creating an external stage. Below are the options (from https://docs.snowflake.com/en/sql-reference/sql/create-stage.html)
[ ENCRYPTION = ( [ TYPE = 'AWS_CSE' ] [ MASTER_KEY = '<string>' ] |
[ TYPE = 'AWS_SSE_S3' ] |
[ TYPE = 'AWS_SSE_KMS' [ KMS_KEY_ID = '<string>' ] |
[ TYPE = NONE ] ) ]
While we know AWS_CSE is used for client-side encryption (where data in encrypted/decrypted by client using Customer Owned Master key respectively during put/get into/from snowflake external stages), what is the use of AWS_SSE_S3 and AWS_SSE_KMS options?
In detail:
In our scenario, we have an S3 bucket (OUR_S3_BUCKET) with encryption set at bucket level as SSE-KMS, created an incoming directory and uploaded a file covid_data.csv.
S3://OUR_S3_BUCKET/incomig/covid_data.csv.
To access this file, we have created storage integration by referring to the S3 bucket and have created three external stage in Snowflake.
EXTERNAL STAGE 1 (without encryption):
CREATE OR REPLACE STAGE TEST_STG_NOENC
URL='S3://OUR_S3_BUCKET/incomig/'
STORAGE_INTEGRATION = INBOUND_S3;
EXTERNAL STAGE 2 (with AWS_SSE_S3):
CREATE OR REPLACE STAGE TEST_STG_SSE_S3
URL='S3://OUR_S3_BUCKET/incomig/'
STORAGE_INTEGRATION = INBOUND_S3
ENCRYPTION = ( TYPE = 'AWS_SSE_S3');
EXTERNAL STAGE 3 (with AWS_SSE_MKS):
CREATE OR REPLACE STAGE TEST_STG_SSE_KMS
URL='S3://OUR_S3_BUCKET/incomig/'
STORAGE_INTEGRATION = INBOUND_S3
ENCRYPTION = ( TYPE = 'AWS_SSE_KMS' KMS_KEY_ID = 'arn:aws:kms:region:account_no:key/KMS_KEY_ID');
We are able to access the covid_data.csv data by selecting from all three external stages.
select t.$1, t.$2, t.$3
from #<<All 3 external stages>> (file_format => OUR_CSV_FILE_FORMAT ) t;
Even though our S3 bucket is encrypted using SSE_KMS, we are able to access the files using a stage (TEST_STG_NOENC) without encryption option.
In these scenario, what is the use of AWS_SSE_S3 and AWS_SSE_KMS encryption options and how it helps?
We had an interesting discussion with snowflake support regarding this topic, so we thought we might aswell share what we found here:
As of right now (August 2022):
The ENCRYPTION setting in the stage configuration does not affect reading from the stage towards snowflake. If the policies are properly configured on the AWS side, reading from snowflake should work regardless of what settings are configured on the snowflake stage. For a bucket encrypted with a KMS key, this basically means the role used by snowflake needs to have the rights to access the bucket, its objects, as well as the KMS keys that have been used to encrypt the objects snowflake will need to read (Please note that each object on a bucket can be encrypted differently)
The ENCRYPTION setting in the stage configuration is used when snowflake writes to the S3-backed stage. Depending on the configuration of the bucket and that of the stage, different outcomes are possible, but basically it goes the following way: the configuration of the stage seems to win and override the default S3 configuration. If the stage does not specify anything regarding encryption, the default S3 configuration will be used.
Please note that for the "writing" part, the best would be to test your own use-case in order to be certain of the outcome before implementing the solution for production.
Currently, the snowflake docs pertaining to these parameters is not very clear about what happens when different configurations mismatch. We will try to ask them to update it so that behavior can be predicted just by reading and understanding documentation.
We name the S3 object name with the birthday of the employees. It is stupid. We want to avoid creating object name with sensitive data. Is it safe to store the sensitive data using S3 user-defined metadata or Add an S3 bucket policy that denies the action S3:Getobject. Which will work?
As you mentioned; its not a good idea to create object name with sensitive data; but its ok... Not too bad also.. I will suggest to remove listAllObjects() permissions in the S3 policy. Policy should only allow getObject() which means anyone can get the object ONLY when they know object name; i.e. when calling api already knows DOB of the user.
With listAllObjects() permissions; caller can list all the objects in the bucket and get DOB of users.
Object keys and user metadata should not be used for sensitive data. The reasoning behind object keys is readily apparent, but metadata may be less obvious;
metadata is returned in the HTTP headers every time an object is fetched. This can't be disabled, but it can be worked around with CloudFront and Lambda#Edge response triggers, which can be used to redact the metadata when the object is downloaded through CloudFront; however,
metadata is not stored encrypted in S3, even if the object itself is encrypted.
Object tags are also not appropriate for sensitive data, because they are also not stored encrypted. Object tags are useful for flagging objects that contain sensitive data, because tags can be used in policies to control access permissions on the object, but this is only relevant when the object itself contains the sensitive data.
In the case where "sensitive" means "proprietary" rather than "personal," tags can be an acceptable place for data... this might be data that is considered sensitive from a business perspective but that does not need to be stored encrypted, such as the identification of a specific software version that created the object. (I use this strategy so that if a version of code is determined later to have a bug, I can identify which objects might have been impacted because they were generated by that version). You might want to keep this information proprietary but it would not be "sensitive" in this context.
If your s3 bucket is used to store private data and your allowing public access to the bucket this is always a bad idea - it's basically security by obscurity.
Instead of changing your existing s3 structure you could lock down the bucket to just your app then you serve the data via cloudfront signed urls?
Basically in your code where you currently inject the s3 url You can instead call the aws api to create a signed url from the s3 url and a policy and send this new url to the end user. This would mask the s3 url, and you can enforce other restrictions like how long the link is valid, enforce requiring a specific header or limit access to a specific ip etc. You also get cdn edge caching and reduced costs as side benefits.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
I'm using the Amazon Encryption SDK to encrypt data before storing it in a database. I'm also using Amazon KMS. As part of the encryption process, the SDK stores the Key Provider ID of the data key used to encrypt in the generated cipher-text header.
As described in the documentation here http://docs.aws.amazon.com/encryption-sdk/latest/developer-guide/message-format.html#header-structure
The encryption operations in the AWS Encryption SDK return a single
data structure or message that contains the encrypted data
(ciphertext) and all encrypted data keys. To understand this data
structure, or to build libraries that read and write it, you need to
understand the message format.
The message format consists of at least two parts: a header and a
body. In some cases, the message format consists of a third part, a
footer.
The Key Provider ID value contains the Amazon Resource Name (ARN) of the AWS KMS customer master key (CMK).
Here is where the issue comes in. Right now I have two different KMS regions available for encryption. Each Key Provider ID has the exact same Encrypted Data Key value. So either key could be used to decrypt the data. However, the issue is with the ciphertext headers. Let's say I have KMS1 and KMS2. If I encrypt the data with the key provided by KMS1, then the Key Provider ID will be stored in the ciphertext header. If I attempt to decrypt the data with KMS2, even though the Encrypted Data Key is the same, the decryption will fail because the header does not contain the Key Provider for KMS2. It has the Key Provider ID for KMS1. It fails with this error:
com.amazonaws.encryptionsdk.exception.BadCiphertextException: Header integrity check failed.
at com.amazonaws.encryptionsdk.internal.DecryptionHandler.verifyHeaderIntegrity(DecryptionHandler.java:312) ~[application.jar:na]
at com.amazonaws.encryptionsdk.internal.DecryptionHandler.readHeaderFields(DecryptionHandler.java:389) ~[application.jar:na]
...
com.amazonaws.encryptionsdk.internal.DecryptionHandler.verifyHeaderIntegrity(DecryptionHandler.java:310) ~[application.jar:na]
... 16 common frames omitted
Caused by: javax.crypto.AEADBadTagException: Tag mismatch!
It fails to verify the header integrity and fails. This is not good, because I was planning to have multiple KMS's in case of one region KMS failing. We duplicate our data across all our regions, and we thought that we could use any KMS from the regions to decrypt as long as the encrypted data keys match. However, it looks like I'm locked into using only the original KMS that was encrypting the data? How on earth can we scale this to multiple regions if we can only rely on a single KMS?
I could include all the region master keys in the call to encrypt the data. That way, the headers would always match, although it would not reflect which KMS it's actually using. However, that's also not scalable, since we could add/remove regions in the future, and that would cause issues with all the data that's already encrypted.
Am I missing something? I've thought about this, and I want to solve this problem without crippling any integrity checks provided by the SDK/Encryption.
Update:
Based on a comment from #jarmod
Using an alias doesn't work either because we can only associate an alias to a key in the region, and it stores the resolved name of the key ARN it's pointing to anyway.
I'm reading this document and it says
Additionally, envelope encryption can help to design your application
for disaster recovery. You can move your encrypted data as-is between
Regions and only have to reencrypt the data keys with the
Region-specific CMKs
However, that's not accurate at all, because the encryption SDK will fail to decrypt on a different region because the Key Provider ID of the re-encrypted data keys will be totally different!
Apologies since I'm not familiar with Java programming, but I believe there is confusion how you are using the KMS CMKs to encrypt (or decrypt) the data using keys from more than one-region for DR.
When you use multiple master keys to encrypt plaintext, any one of the master keys can be used to decrypt the plaintext. Note that, only one master key (let's say MKey1) generates the plaintext data key which is used to encrypt the data. This plaintext data key is then encrypted by the other master key (MKey2) as well.
As a result, you will have encrypted data + encrypted data key (using MKey1) + encrypted data key (using MKey2).
If for some reason MKey1 is unavailable and you want to decrypt the ciphertext, SDK can be used to decrypt the encrypted data key using MKey2, which can decrypt the ciphertext.
So, yes, you have to specify multiple KMS CMK ARN in your program if you want to use multiple KMS. The document shared by you has an example as well which I'm sure you are aware of.