I need to build an identity service that uses a customer supplied key to encrypt sensitive ID values for storage in RDS but also has to allow us to look up a record later using the plaintext ID. We'd like to use a simple deterministic encryption algorithm for this but it looks like KMS API doesn't allow you to specify the IV so you can never get identical plaintext to encrypt to the same value twice.
We also have the requirement to look up the data using another non-secure value and retrieve the encrypted secure value and decrypt it - so one-way hashing is unfortunately not going to work.
Taken together, this means we won't be able to perform our lookup of the secure ID without brute force iterating through all records and decrypting them and comparing to the plaintext value, instead of simply encrypting the plaintext search value using a known IV and using that encrypted value as an index to look up the matching record in the database.
I'm guessing this is a pretty common requirement for things like SSN's and such so how do people solve for it?
Thanks in advance.

look up a record later using the plaintext ID
Then you are loosing quite a bit of security. Maybe you could store a hash (e. g. sha-256) of the ID along the encrypted data, which would make easier to lookup the record, but not revert the value
This approach assumes that the ID is from a reasonably large message space (there are potentially a lot of IDs) so it is not feasible to create a map for every possible value
KMS API doesn't allow you to specify the IV so you can never get identical plaintext to encrypt to the same value twice.
yes, KMS seems to provide its own IV for ciphertext enforcing good security practice

if I understand your use case correctly, your flow is like this:
The customer provides a key K and you use this key to encrypt a secret S, which is stored in RDS with an associated ID.
Given a non-secret key K, you want to be able to look up S and decrypt it.
If the customer is reusing the key, this is actually not all that hard to accomplish.
Create a KMS key for the customer.
Use this KMS key to encrypt the customer's IV and the key the customer has specified, and store them in Amazon Secrets Manager - preferably namespaced in some way by customer. A Json structure like this:
"iv": "somerandomivvalue",
"key": "somerandomkey"
would allow you to easily parse the values out. ASM also allows you to seamlessly perform key rotation - which is really nifty.
If you're paranoid, you could take a cryptographic hash of the customer name (or whatever) and namespace by that.
RDS now stores the numeric ID of the customer, the insecure values, and a namespace value (or some method of deriving the location) in ASM.
It goes without saying that you need to limit access to the secrets manager vault.
To employ the solution:
Customer issues request to read secure value.
Service accesses ASM and decrypts the secret for customer.
Service extracts IV and key
Service initialises cipher scheme with IV and key and decrypts customer data.
Benefits: You encrypt and decrypt the secret values in ASM with a KMS key under your full control, and you can store and recover whatever state you need to decrypt the customer values in a secure manner.
Others will probably have cryptographically better solutions, but this should do for a first attempt.

In the end we decided to continue to use KMS for the customer supplied key encrypt/decrypt of the sensitive ID column but also enabled the PostgreSQL pgcrypt extension to provide secure hashes for lookups. So in addition to our encrypted column we added an id_hash column and we operate on the table something like this:
SELECT FROM employee WHERE division_id = ??? AND id_hash = ENCODE(HMAC('SENSITIVE_ID+SECRET_SALT', 'SECRET_PASSPHRASE', 'sha256'), 'hex');`
We could have done the hashing client-side but since the algorithm is key to later lookups we liked the simplicity of having the DB do the hashing for us.
Hope this is of use to anyone else looking for a solution.


How can I generate a HMAC key and secret key and share with client using AWS?

I am looking to generate a HMAC key and secret value as I want to use it as part of API request signatures. I want to be able to share the secret value and key with a 3rd party so I need access the value in plain text for one time. There would be a HMAC per 3rd party so the number could be large.
Option 1, I could generate this application side but I don't want to store in the dB and I was hoping to use a aws for storage but unsure what the process would be?
Option 2, Preferably I wanted to use AWS to generate the key and secret for HMAC as it can ensure uniqueness etc. I wanted it to provide the key and the secret one time. Looking at the documentation it seems to suggest that the secret value never leaves the HSM. Is my understanding correct or what is the best way to implement this using AWS?

Cryptographic Hash to verify identification key

Let's say I want to pass information to the user that includes the user's unique id. Then, I want to use that id for CRUD operations. Is it a viable, or even recommended, option to store a cryptographic hash of that data, which would remain static using something like SHA-2 and then verify that what the user passed to me was what I sent them? Or, should I never send them the information in the first place and just look up the information from a table?
My issue now is that I am using AWS Cognito and using the sub as the unique identifier. So, I do not want to 'trust' the end user with sending me that sub after cognito provides them with it.

AWS Encryption SDK Header Mismatch between Regions

I'm using the Amazon Encryption SDK to encrypt data before storing it in a database. I'm also using Amazon KMS. As part of the encryption process, the SDK stores the Key Provider ID of the data key used to encrypt in the generated cipher-text header.
As described in the documentation here
The encryption operations in the AWS Encryption SDK return a single
data structure or message that contains the encrypted data
(ciphertext) and all encrypted data keys. To understand this data
structure, or to build libraries that read and write it, you need to
understand the message format.
The message format consists of at least two parts: a header and a
body. In some cases, the message format consists of a third part, a
The Key Provider ID value contains the Amazon Resource Name (ARN) of the AWS KMS customer master key (CMK).
Here is where the issue comes in. Right now I have two different KMS regions available for encryption. Each Key Provider ID has the exact same Encrypted Data Key value. So either key could be used to decrypt the data. However, the issue is with the ciphertext headers. Let's say I have KMS1 and KMS2. If I encrypt the data with the key provided by KMS1, then the Key Provider ID will be stored in the ciphertext header. If I attempt to decrypt the data with KMS2, even though the Encrypted Data Key is the same, the decryption will fail because the header does not contain the Key Provider for KMS2. It has the Key Provider ID for KMS1. It fails with this error:
com.amazonaws.encryptionsdk.exception.BadCiphertextException: Header integrity check failed.
at com.amazonaws.encryptionsdk.internal.DecryptionHandler.verifyHeaderIntegrity( ~[application.jar:na]
at com.amazonaws.encryptionsdk.internal.DecryptionHandler.readHeaderFields( ~[application.jar:na]
com.amazonaws.encryptionsdk.internal.DecryptionHandler.verifyHeaderIntegrity( ~[application.jar:na]
... 16 common frames omitted
Caused by: javax.crypto.AEADBadTagException: Tag mismatch!
It fails to verify the header integrity and fails. This is not good, because I was planning to have multiple KMS's in case of one region KMS failing. We duplicate our data across all our regions, and we thought that we could use any KMS from the regions to decrypt as long as the encrypted data keys match. However, it looks like I'm locked into using only the original KMS that was encrypting the data? How on earth can we scale this to multiple regions if we can only rely on a single KMS?
I could include all the region master keys in the call to encrypt the data. That way, the headers would always match, although it would not reflect which KMS it's actually using. However, that's also not scalable, since we could add/remove regions in the future, and that would cause issues with all the data that's already encrypted.
Am I missing something? I've thought about this, and I want to solve this problem without crippling any integrity checks provided by the SDK/Encryption.
Based on a comment from #jarmod
Using an alias doesn't work either because we can only associate an alias to a key in the region, and it stores the resolved name of the key ARN it's pointing to anyway.
I'm reading this document and it says
Additionally, envelope encryption can help to design your application
for disaster recovery. You can move your encrypted data as-is between
Regions and only have to reencrypt the data keys with the
Region-specific CMKs
However, that's not accurate at all, because the encryption SDK will fail to decrypt on a different region because the Key Provider ID of the re-encrypted data keys will be totally different!
Apologies since I'm not familiar with Java programming, but I believe there is confusion how you are using the KMS CMKs to encrypt (or decrypt) the data using keys from more than one-region for DR.
When you use multiple master keys to encrypt plaintext, any one of the master keys can be used to decrypt the plaintext. Note that, only one master key (let's say MKey1) generates the plaintext data key which is used to encrypt the data. This plaintext data key is then encrypted by the other master key (MKey2) as well.
As a result, you will have encrypted data + encrypted data key (using MKey1) + encrypted data key (using MKey2).
If for some reason MKey1 is unavailable and you want to decrypt the ciphertext, SDK can be used to decrypt the encrypted data key using MKey2, which can decrypt the ciphertext.
So, yes, you have to specify multiple KMS CMK ARN in your program if you want to use multiple KMS. The document shared by you has an example as well which I'm sure you are aware of.

What can you do with AWS_ACCESS_KEY_ID without also having AWS_SECRET_ACCESS_KEY?

All the documentation about AWS keys seems to always tell you to have both the key id and the secret key. Are there any practical uses to have only the key id without the secret key? If not, why aren't the two combined into one ever so slightly more manageable single setting?
Seems to me if you must ask the user to produced the secret you might just as well ask for their own key id as well in the process.
More general:
All Amazon APIs only work with the access key + signature. The signature is the way you prove you also have the secret key. The secret key never goes over the wire.
If you would "combine" them in the same key you would not know what account the request is for. You would also have to send the secret key over the wire which, in general, is a very bad thing.
So basically the public (access) key servers as an account selector and the private key serves to prove you actually have access to the account.

Multiuser access to encrypted data

I'm building a server-side application which requires the data the be stored encrypted in the database. When a client accesses the data, it also has to be transferred encrypted. The clients each has a unique login.
My original idea to do this, is to store the data encrypted with a symmetric-algorithm like AES. So when a client wants to access the data the encrypted data is transferred to the client, while the key is encrypted with the public key from the client.
Is this a secure way to do store and transfer the data or is there a better solution to this problem?
Update: If following Søren's suggestion to keep a copy of the AES key encrypted using each client's public key, wouldn't that include the key to be stored somewhere in order to add additional clients or could that be generated in any way?
First you should start by defining some security properties you want to provide, for example:
Is it ok to give different users access to the same secret key? Aka if File1 is AES encrypted with key K, is it a problem if user Alice and user Bob both are given K.
How do I revoke users from the system? (It turns out Bob from scenario 1 is actually a Chinese spy working for our company, how do I securely kick him out of the system).
Does the encrypted data that is saved in the database need to be searched? (This problem is well researched and hard to solve!)
How much (if any) and what plaintext data will be placed into the database to help organize it? Databases expect data to have unique keys associated with them. You need to make sure these keys don't leak information, but are useful enough to retrieve the data later.
How often should secret keys be changed? If you are storing files and multiple users are allowed access to encrypted files, what happens when user X modifies a file? Does the secret key change? Should the new key be sent to all users?
What happens when 2 users modify the same data at the same time? Will the database be able to handle this without modification?
There are many others.
If the server is not trusted and must never see plaintext data, then here's a general overview of a possible solution.
Let the clients managed the crypto completely. Clients authenticate with the server and are allowed to store data into the database. It is the responsibility of the client to make sure the data is encrypted.
In this scenario, keys should be saved securely only on the clients computer. If they must be placed elsewhere, a "master key" could be created.
Secure from what? You need to define your goals more clearly.
The solution would protect the data during transfer, but from your description, the server would have full access to the data (since it'd need to store the AES key unencrypted). In other words, a hacker or burglar with access to the server would have full access to the data.
If secure transmission is what you want, use an SSL / TLS wrapper around the database connection. This is a standard solution from all major vendors.
To secure the data server side, the server should not have the AES key. If the number of clients were limited, the server could store a copy of the AES key for every client, each copy of the key already encrypted with the public key of each client, such that the server never sees the plain text data nor any unencrypted AES keys.
That is indeed the common approach, e.g. also used by NTFS file encryption.