Handle encrypted JSON in AWS Glue Job - amazon-web-services

In our on premise environment JSON is generating for loan data and encrypted using a core crypto jar and this encrypted JSON is getting saved into mysql tables and the same core crypto jar being called from java to decrypt the same JSON value. Now we have decided to use a Glue service for ETL purpose. Can anyone help me here to call a core crypto when the the data from the encrypted JSON exists during the Glue execution.
How can we handle the above process in AWS Glue ETL Job ?

You may need to use a custom script.
https://docs.aws.amazon.com/glue/latest/dg/console-custom-created.html
You can specify the jars that your script is dependent upon:
Dependent jars path Comma-separated Amazon S3 paths to JAR files that
are required by the script. Note Currently, only pure Java or Scala
(2.11) libraries can be used.
The create a Glue job as described here:
https://docs.aws.amazon.com/glue/latest/dg/add-job.html

Your system is no more secure if at the end of the day you will be needing to upload your secret key to AWS to decrypt this JSON. You may as well not encrypt this JSON when you save it to the database, and instead configure the database to be encrypted by a customer managed KMS key.
You'll get much more functionality from doing things this way as you can log KMS key usage as well as restricting what services have access to be able to decrypt the data. If you keep the secret in your jar file you will need to have this jar file wherever you read this data, and will end up distributing this secret in different places, without security controls KMS gives you, or the auditing.

Related

Reading a KMS encrypted file from Google Cloud Dataflow

I went through this Google Cloud Documentation, which mentions that :-
Dataflow can access sources and sinks that are protected by Cloud KMS keys without you having to specify the Cloud KMS key of those sources and sinks, as long as you are not creating new objects.
I have a few questions regarding this:
Q.1. Does this mean we don't need to decrypt the encrypted source file within our Beam code ? Does Dataflow has this functionality built-in?
Q.2. If the source file is encrypted, will the output file from Dataflow be encrypted by default with the same key (let us say we have a symmetric key) ?
Q.3. What are the objects that are being referred here?
PS: I want to read from an encrypted AVRO file placed in the GCS bucket, apply my Apache Beam Transforms from my code and write an encrypted file back to the bucket.
Cloud Dataflow is a fully managed service where if encryption is not specified, it automatically applies Cloud KMS encryption. Cloud KMS is cloud hosted key management service that can manage both symmetric and asymmetric cryptographic keys.
When Cloud KMS is used with Cloud Dataflow, it allows you to encrypt the data that is to be processed in the Dataflow pipeline. Using Cloud KMS, the data that is temporarily stored in temporary storage like Persistent Disk can be encrypted to get end-to-end protection of data. You need not to decrypt the source file within the beam code as data from the sources is encrypted and decryption will be done automatically by the Dataflow.
If you are using a symmetric key, then a single key can be used for both encryption and decryption of the data which is managed by Cloud KMS stored in ciphertext. If you are using an asymmetric key, then a public key will be used to encrypt the data and a private key will be used to decrypt the data. You need to provide Cloud KMS CryptoKey Encrypter/Decrypter role to the Dataflow service account before performing encryption and decryption. Cloud KMS automatically determines the key for decryption based on the provided ciphertext so no need to take extra care for decryption.
The objects that you have mentioned which are encrypted by the Cloud KMS can be tables in BigQuery, files in Cloud Storage and different data in the sources and sinks.
For more information you can check this blog.

Encryption of s3 object

I am storing data in file in aws s3 and already enabled SSE. but i am curious to know is there a way to encrypt the data so when someone download the file so they cant see the content?? I am just new to AWS and it would be great if somw one give the input
Use the AWS Key Management Service (AWS KMS) to encrypt the data prior to uploading it to an Amazon S3 bucket. Then the data will remain encrypted until it's decrypted using the key. YOu can find an example here (for Java SDK)
https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/javav2/example_code/s3/src/main/java/com/example/s3/KMSEncryptionExample.java
already enabled SSE.
SSE encrypts the content on S3, but an authenticated client cloud access the content in plain, the encryption is done under the hood and the client is unable to access the ciphertext (encrypted form)
You can use the default s3 key or a custom KMS key (CMS) , where the client need explicit access to decrypt the content.
download the file so they cant see the content??
Then the content needs to be encrypted before the upload. AWS provides some support for the client-side encryption but the client is free to implement its own encryption strategy and the key management.
To solve trouble with managing the keys on the client side, it's often more practical to stick with SSE and allow access to S3 or the used CMS (key) only to identities that must access the content.

Capture the S3 file download start and end times and other details

I want to expose an API (preferably using AWS API gateway/ Lambda/Go) to the users.
Using this API, the users can download a binary file from S3 bucket.
I want to capture the metrics like, which user has started download of the file, the time at which the file download had started and finished.
I want to record these timestamps in DynamoDB.
S3 has support for Events for creating/modifying/deleting files, so I can write a lambda function for these events.
But S3 doesn't seems to have support for read actions ( e.g. download a file)
I am thinking to write a Lambda function, which will be invoked when the user calls the API to download the file. In the lambda, I want to record the timestamp, read the file into a buffer, encode it and then send it as as base64 encoded response to the client.
Let me know if there is any better alternative approach.
use Amazon S3 Server Access Logging
don't use DynamoDB, if you need to query the logs in the target bucket setup Spectrum to query the logs which are also in S3
Maybe you can use S3 Access Logs?
And configure event based on new records in log bucket. However, this logs will not tell you if user has finished download or not.

How to control the access of AWS secret manager

Suppose I am an AWS Superuser who has all AWS permissions.
I have configured AWS glue including the connection to a database using username and password.
I have stored the username and password in AWS secret manager, the glue’s ETL job script will connect database using this information and then run the ETL job.
ETL Data engineers do not have super user permission. But they know how to write the details of the ETL job script. And the script needs to retrieve the secret info first, which means engineers can write code to print out the password… and we have a lot of Data engineers…
My question is: what is the right strategy to control the password access of the secret manager?
1) Shall we allow ETL data engineers to update script to glue and run it? then they can see the password, or
2) Shall we only allow them to write ETL script, but let superuser to update the script to glue after reviewing the code? or
3) Do we have a way to separate the ETL job script code and get_password code?
Note, I know How to use IAM, tags to control secret manager. But my question is different.

Get object from encrypted bucket using boto2?

Is it possible to get files from encrypted S3 buckets using boto 2? I am working with a project that uses S3 in several places and has to read/write to an encrypted S3 bucket. I would like to make as small a change as possible, for the time being, to support encryption.
Encryption actually works at the object level, rather than the bucket.
There are several ways to use encryption. If it is Protecting Data Using Server-Side Encryption with Amazon S3-Managed Encryption Keys (SSE-S3), then as long as your app has permission to access the object then it will be automatically decrypted. (The app won't even notice that it was encrypted!)
If it is Protecting Data Using Server-Side Encryption with AWS KMS–Managed Keys (SSE-KMS), the app will also need adequate permissions to use the key in KMS. The object will be automatically decrypted, but it needs permissions to use the key.
If the app is Protecting Data Using Server-Side Encryption with Customer-Provided Encryption Keys (SSE-C), then the app must provide the encryption key when it tries to access the object.
And finally, if it is Protecting Data Using Client-Side Encryption, then the app is totally responsible for encryption/decryption.
It is most likely that your data is using Server-Side Encryption with Amazon S3-Managed Encryption Keys (SSE-S3). If so, then your app doesn't have to do anything — it will all be handled automagically by Amazon S3.