As per AWS documentation on how to enable at rest Encryption for local disks in EMR, there are 2 methods specified. I am interested in using the Open Source HDFS encryption.
Data encryption on HDFS block data transfer is set to true and is
configured to use AES 256 encryption.
How is this set up?
If this is some setting in the hdfs-site.xml, how do I set this as part of my cloudformation template when deploying the EMR cluster?
You just have to define a security configuration that enables local disk encryption for the above to be used.
No need to meddle with hdfs-site.xml directly.
Documentation for creating a security configuration can be found at: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-create-security-configuration.html
Related
I have an existing ElastiCache Global Data Store (Redis). I need to enable Encryption in transit . I referred to AWS Documentation , it talks about how to enable for a redis cluster but unable to do the same for my Global Data Store, though i expect the steps to same.
Refer Screenshot below. There is no option to modify or change the encryption details.
I'm trying to copy data from an EMR cluster to S3 using s3-distcp. Can I specify the number of reducers to a greater value than the default so as to fasten my process?
For setting up number of reducers, you can use the property mapreduce.job.reduces similar to below:
s3-dist-cp -Dmapreduce.job.reduces=10 --src hdfs://path/to/data/ --dest s3://path/to/s3/
Using S3DistCp, you can efficiently copy large amounts of data from Amazon S3 into HDFS where it can be processed by subsequent steps in your Amazon EMR cluster.
You can call S3DistCp by adding it as a step in your existing EMR cluster. Steps can be added to a cluster at launch or to a running cluster using the console, AWS CLI, or API.
So you control the number of workers during EMR cluster creation or you can resize existing cluster. You can check exact steps in EMR docs.
I have manually installed Confluent Kafka Connect S3 using the standalone method and not through Confluent's process or as part of the whole platform.
I can successfully launch the connector from the command line with the command:
./kafka_2.11-2.1.0/bin/connect-standalone.sh connect.properties s3-sink.properties
Topic CDC offsets from AWS MSK can be seen being consumed. No errors are thrown. However, in AWS S3, no folder structure is created for new data and no JSON data is stored.
Questions
Should the connector dynamically create the folder structure as it
sees the first JSON packet for a topic?
Other than configuring
awscli credentials, connect.properties and s3-sink.properties are
there any other settings that need to be set to properly connect to
the S3 bucket?
Recommendations on install documentation more
comprehensive than the standalone docs on the Confluent website? (linked above)
connect.properties
bootstrap.servers=redacted:9092,redacted:9092,redacted:9092
plugin.path=/plugins/kafka-connect-s3 key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
s3-sink.properties
name=s3-sink connector.class=io.confluent.connect.s3.S3SinkConnector
tasks.max=1
topics=database_schema_topic1,database_schema_topic2,database_schema_topic3
s3.region=us-east-2 s3.bucket.name=databasekafka s3.part.size=5242880
flush.size=1 storage.class=io.confluent.connect.s3.storage.S3Storage
format.class=io.confluent.connect.s3.format.json.JsonFormat
schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
schema.compatibility=NONE
Should the connector dynamically create the folder structure as it sees the first JSON packet for a topic?
Yes, even you control this path(directory structure) using parameter "topics.dir" and "path.format"
Other than configuring awscli credentials, connect.properties and s3-sink.properties are there any other settings that need to be set to properly connect to the S3 bucket?
By default, S3 connector will use Aws credentials (access id and secret key) through environment variables or credentials file.
You can change by modifying the parameter "s3.credentials.provider.class". Default value of the parameter is "DefaultAWSCredentialsProviderChain"
Recommendations on install documentation more comprehensive than the standalone docs on the Confluent website? (linked above)
I recommend you to go with distributed mode as it provides high availability for your connect cluster and connectors running on it.
You can go through below documentation to configure connect cluster in distributed mode.
https://docs.confluent.io/current/connect/userguide.html#connect-userguide-dist-worker-config
I have created a Rds in aws by enabling encryption option. I need some background how this encryption works and test is it encrypting.
The encryption is on the disk volumes that your data is stored on. Just like enabling encryption on EBS volumes of EC2 instances. There's really no way for you to test that it is actually encrypted.
I want to migrate my RDS database from MySQL (5.6) to Aurora. The DB is encrypted. I am trying to follow the process outlined here, but don't have the option to enable encryption.
Elsewhere, documentation for creating RDS read replicas says that the replica will be forced to have the same encryption setting (Yes or No) as the source DB. But in this case, the encryption control for the replica is set to No, and grayed out. I tried various instance sizes, with no change.
This link says that the encryption option is "Not available," which suggests that encryption is either always on or always off.
But the FAQs say "Amazon Aurora allows you to encrypt your databases using keys you manage through AWS Key Management Service (KMS)." So I remain confused. How do I identify the KMS key, or turn at-rest encryption on or off for Aurora?
Is it possible to replicate an encrypted MySQL DB to an encrypted Aurora DB?
It doesn't appear to be possible with a managed replica, and the FAQ entry probably predates the release of managed RDS for MySQL-to-Aurora replicas... but it should be possible if you set up a new, encrypted Aurora cluster with no data, and make it an unmanaged replica of the RDS for MySQL instance, yourself.