I'm doing a lot of work with AWS EMR and when you build an EMR cluster through the AWS Management Console you can click a button to export the AWS CLI Command that creates the EMR cluster.
It then gives you a big CLI command that isn't formatted in any way i.e., if you copy and paste the command it's all on a single line.
I'm using these EMR CLI commands, that were created by other individuals, to create the EMR clusters in Python using the AWS SDK Boto3 library i.e., I'm looking at the CLI command to get all the configuration details. Some of the configuration details are present on the AWS Management Console UI but not all of them so it's easier for me to use the CLI command that you can export.
However, the AWS CLI command is very hard to read since it's not formatted. Is there an AWS CLI command formatter available online similar to JSON formatters?
Another solution I could use is to Clone the EMR Cluster and go through the EMR Cluster creation screen on the AWS Management Console to get all the configuration details but I'm still curious if I could format the CLI Command and do it that way. Another added benefit of being able to format the exported CLI command is that I could put it on a Confluence page for documentation.
Here is some quick python code to do it:
import shlex
import json
import re
def format_command(command):
tokens = shlex.split(command)
formatted = ''
for token in tokens:
# Flags get a new line
if token.startswith("--"):
formatted += '\\\n '
# JSON data
if token[0] in ('[', '{'):
json_data = json.loads(token)
data = json.dumps(json_data, indent=4).replace('\n', '\n ')
formatted += "'{}' ".format(data)
# Quote token when it contains whitespace
elif re.match('\s', token):
formatted += "'{}' ".format(token)
# Simple print for remaining tokens
else:
formatted += token + ' '
return formatted
example = """aws emr create-cluster --applications Name=spark Name=ganglia Name=hadoop --tags 'Project=MyProj' --ec2-attributes '{"KeyName":"emr-key","AdditionalSlaveSecurityGroups":["sg-3822994c","sg-ccc76987"],"InstanceProfile":"EMR_EC2_DefaultRole","ServiceAccessSecurityGroup":"sg-60832c2b","SubnetId":"subnet-3c76ee33","EmrManagedSlaveSecurityGroup":"sg-dd832c96","EmrManagedMasterSecurityGroup":"sg-b4923dff","AdditionalMasterSecurityGroups":["sg-3822994c","sg-ccc76987"]}' --service-role EMR_DefaultRole --release-label emr-5.14.0 --name 'Test Cluster' --instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m4.xlarge","Name":"Master"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m4.xlarge","Name":"CORE"}]' --configurations '[{"Classification":"spark-defaults","Properties":{"spark.sql.avro.compression.codec":"snappy","spark.eventLog.enabled":"true","spark.dynamicAllocation.enabled":"false"},"Configurations":[]},{"Classification":"spark-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"SPARK_DAEMON_MEMORY":"4g"},"Configurations":[]}]}]' --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region us-east-1"""
print(format_command(example))
Output looks like this:
aws emr create-cluster \
--applications Name=spark Name=ganglia Name=hadoop \
--tags Project=MyProj \
--ec2-attributes '{
"ServiceAccessSecurityGroup": "sg-60832c2b",
"InstanceProfile": "EMR_EC2_DefaultRole",
"EmrManagedMasterSecurityGroup": "sg-b4923dff",
"KeyName": "emr-key",
"SubnetId": "subnet-3c76ee33",
"AdditionalMasterSecurityGroups": [
"sg-3822994c",
"sg-ccc76987"
],
"AdditionalSlaveSecurityGroups": [
"sg-3822994c",
"sg-ccc76987"
],
"EmrManagedSlaveSecurityGroup": "sg-dd832c96"
}' \
--service-role EMR_DefaultRole \
--release-label emr-5.14.0 \
--name Test Cluster \
--instance-groups '[
{
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"VolumeType": "gp2",
"SizeInGB": 32
},
"VolumesPerInstance": 1
}
]
},
"InstanceCount": 1,
"Name": "Master",
"InstanceType": "m4.xlarge",
"InstanceGroupType": "MASTER"
},
{
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"VolumeType": "gp2",
"SizeInGB": 32
},
"VolumesPerInstance": 1
}
]
},
"InstanceCount": 1,
"Name": "CORE",
"InstanceType": "m4.xlarge",
"InstanceGroupType": "CORE"
}
]' \
--configurations '[
{
"Properties": {
"spark.eventLog.enabled": "true",
"spark.dynamicAllocation.enabled": "false",
"spark.sql.avro.compression.codec": "snappy"
},
"Classification": "spark-defaults",
"Configurations": []
},
{
"Properties": {},
"Classification": "spark-env",
"Configurations": [
{
"Properties": {
"SPARK_DAEMON_MEMORY": "4g"
},
"Classification": "export",
"Configurations": []
}
]
}
]' \
--scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
--region us-east-1
Related
I ran the AWS-RunPatchBaseline run command and few of my instance are successful and few of them are timed out. I want to filter the instance that were timed out using the aws cli list-command-inovcations command.
When I ran the below CLI command:
aws ssm list-command-invocations --command-id 7894b7658-a156-4e5g-97t2-2a9ab5498e1d
It displays a ouput attached here
Next, from the above output, I want to filter all the instance that have the "Status": "Timedout", "StatusDetails": "DeliveryTimedOut" (or, actually, everything other than "Status": "Success")
I tried:
aws ssm list-command-invocations --command-id 7894b7658-a156-4e5g-97t2-2a9ab5498e1d --output text --query '#[?(CommandInvocations.Status != 'Success')]'
it is returning None.
I also tried
aws ssm list-command-invocations --command-id 7894b7658-a156-4e5g-97t2-2a9ab5498e1d --output text --query '#[?(#.Status != 'Success')]'
which is returning None, as too.
And, with
aws ssm list-command-invocations --command-id 7894b7658-a156-4e5g-97t2-2a9ab5498e1d --output text --query 'CommandInvocations[?(#.Status != 'Success')]'
it is not filtered, returning the complete output.
Since you did not provide an example of output one can copy / paste for testing purpose, this example is based on the output from the AWS documentation, where I changed the Status of the command of ID ef7fdfd8-9b57-4151-a15c-db9a12345678, which I also cleaned a bit from the excess data, so:
{
"CommandInvocations": [
{
"CommandId": "ef7fdfd8-9b57-4151-a15c-db9a12345678",
"InstanceId": "i-02573cafcfEXAMPLE",
"InstanceName": "",
"DocumentName": "AWS-UpdateSSMAgent",
"DocumentVersion": "",
"RequestedDateTime": 1582136283.089,
"Status": "TimedOut",
"StatusDetails": "DeliveryTimeOut"
},
{
"CommandId": "ef7fdfd8-9b57-4151-a15c-db9a12345678",
"InstanceId": "i-0471e04240EXAMPLE",
"InstanceName": "",
"DocumentName": "AWS-UpdateSSMAgent",
"DocumentVersion": "",
"RequestedDateTime": 1582136283.02,
"Status": "Success",
"StatusDetails": "Success"
}
]
}
Given this JSON, the filter to apply is quite like the one you can find in the tutorial chapter "Filter Projections".
You just need to select the property under where the array is, in your case, CommandInvocations, and apply your condition, Status != `Success`, inside the brackets [? ].
So, with the query:
CommandInvocations[?Status != `Success`]
On the above JSON, we end up with the expected:
[
{
"CommandId": "ef7fdfd8-9b57-4151-a15c-db9a12345678",
"InstanceId": "i-02573cafcfEXAMPLE",
"InstanceName": "",
"DocumentName": "AWS-UpdateSSMAgent",
"DocumentVersion": "",
"RequestedDateTime": 1582136283.089,
"Status": "TimedOut",
"StatusDetails": "DeliveryTimeOut"
}
]
And, so, your AWS command should be:
aws ssm list-command-invocations \
--command-id 7894b7658-a156-4e5g-97t2-2a9ab5498e1d \
--output text \
--query 'CommandInvocations[?Status != `Success`]'
When I execute this:
aws route53 list-hosted-zones
I have the following response:
{
"HostedZones": [
{
"Id": "/hostedzone/Z209RXXXXXXE2L",
"Name": "ultrasist.net.",
"CallerReference": "ECXXXX62-EXXA-8XX9-8XX5-8E52XXXXXX81",
"Config": {
"Comment": "Corporate zone",
"PrivateZone": false
},
"ResourceRecordSetCount": 46
}
]
}
So, I am pretty sure that I have the domain "ultrasist.net". Now, if I do this:
aws \
lightsail \
create-domain-entry \
--region 'us-east-1' \
--domain-name 'ultrasist.net' \
--domain-entry '{"name":"_test-txt.ultrasist.net","target":"\"xyz\"", "isAlias":false,"type":"TXT"}'
I got this response:
An error occurred (NotFoundException) when calling the CreateDomainEntry operation: The Domain does not exist: ultrasist.net
However, as you can see, the domain DOES exist. So, my question is pretty obvious: Why Do I get this message if the domain of course is there?
I want to automate the running of a cluster and can use tags to get attributes of an EC2 instance like its instance-id.
The documentation on https://docs.aws.amazon.com/cli/latest/reference/emr/create-cluster.html states that
--tags (list)
A list of tags to associate with a cluster, which apply to each Amazon
EC2 instance in the cluster. Tags are key-value pairs that consist of
a required key string with a maximum of 128 characters, and an
optional value string with a maximum of 256 characters.
You can specify tags in key=value format or you can add a tag without
a value using only the key name, for example key . Use a space to
separate multiple tags.
So this applies tags to every EC2 instance including the master and slaves. How do I discern which instance is the master node?
Additional Info:
I am using the following command to get attributes from aws cli based on tags where you can replace the "Name" and "Prod" with your tags key-value pairs respectively.
aws ec2 describe-instances | jq '.Reservations[].Instances | select(.[].Tags[].Value | startswith("Prod") ) | select(.[].Tags[].Key == "Name") | {InstanceId: .[].InstanceId, PublicDnsName: .[].PublicDnsName, State: .[].State, LaunchTime: .[].LaunchTime, Tags: .[].Tags} | [.]' | jq .[].InstanceId
As you noted when you create an EMR cluster, the tags are the same for all nodes (Master, Slave, Task).
You will find that this process using the AWS CLI to be complicated. My recomendation is to review the examples below and then write a Python program to do this.
Process to add your own tags to the EC2 instances.
STEP 1: List your EMR Clusters:
aws emr list-clusters
This will output JSON:
{
"Clusters": [
{
"Id": "j-ABCDEFGHIJKLM",
"Name": "'MyCluster'",
"Status": {
"State": "WAITING",
"StateChangeReason": {
"Message": "Cluster ready after last step completed."
},
"Timeline": {
"CreationDateTime": 1536626095.303,
"ReadyDateTime": 1536626568.482
}
},
"NormalizedInstanceHours": 0
}
]
}
STEP 2: Make a note of the Cluster ID from the JSON:
"Id": "j-ABCDEFGHIJKLM",
STEP 3: Describe your EMR Cluster:
aws emr describe-cluster --cluster-id j-ABCDEFGHIJKLM
This will output JSON (I have truncated this output to just the MASTER section):
{
"Cluster": {
"Id": "j-ABCDEFGHIJKLM",
"Name": "'Test01'",
....
"InstanceGroups": [
{
"Id": "ig-2EHOYXFABCDEF",
"Name": "Master Instance Group",
"Market": "ON_DEMAND",
"InstanceGroupType": "MASTER",
"InstanceType": "m3.xlarge",
"RequestedInstanceCount": 1,
"RunningInstanceCount": 1,
"Status": {
"State": "RUNNING",
"StateChangeReason": {
"Message": ""
},
"Timeline": {
"CreationDateTime": 1536626095.316,
"ReadyDateTime": 1536626533.886
}
},
"Configurations": [],
"EbsBlockDevices": [],
"ShrinkPolicy": {}
},
....
]
}
}
STEP 4: InstanceGroups is an array. Find the entry where InstanceGroupType is MASTER. Make note of the Id.
"Id": "ig-2EHOYXFABCDEF",
STEP 5: List your cluster instances:
aws emr list-instances --cluster-id j-ABCDEFGHIJKLM
This will output JSON (I have truncated the output):
{
"Instances": [
....
{
"Id": "ci-31LGK4KIECHNY",
"Ec2InstanceId": "i-0524ec45912345678",
"PublicDnsName": "ec2-52-123-201-221.us-west-2.compute.amazonaws.com",
"PublicIpAddress": "52.123.201.221",
"PrivateDnsName": "ip-172-31-41-111.us-west-2.compute.internal",
"PrivateIpAddress": "172.31.41.111",
"Status": {
"State": "RUNNING",
"StateChangeReason": {},
"Timeline": {
"CreationDateTime": 1536626164.073,
"ReadyDateTime": 1536626533.886
}
},
"InstanceGroupId": "ig-2EHOYXFABCDEF",
"Market": "ON_DEMAND",
"InstanceType": "m3.xlarge",
"EbsVolumes": []
}
]
}
STEP 6: Find the matching InstanceGroupId ig-2EHOYXFABCDEF. This will give you the EC2 Instance ID for the MASTER: "Ec2InstanceId": "i-0524ec45912345678"
Step 7: Tag your EC2 instance:
aws ec2 create-tags --resources i-0524ec45912345678 --tags Key=EMR,Value=MASTER
The above steps might be simpler with CLI Filters and / or jq, but this should be enough information so that you know how to find and tag the EMR Master Instance.
Below can be used to directly get instance Id
aws emr list-instances --cluster-id ${aws_emr_cluster.cluster.id} --instance-
group-id ${aws_emr_cluster.cluster.master_instance_group.0.id} --query
'Instances[*].Ec2InstanceId' --output text
In an enviroinment where you does not have the aws cli, you can cat the following file:
cat /mnt/var/lib/info/job-flow.json
An example of the content is the following one:
{
"jobFlowId": "j-0000X0X0X00XX",
"jobFlowCreationInstant": 1579512208006,
"instanceCount": 2,
"masterInstanceId": "i-00x0xx0000xxx0x00",
"masterPrivateDnsName": "localhost",
"masterInstanceType": "m5.xlarge",
"slaveInstanceType": "m5.xlarge",
"hadoopVersion": "2.8.5",
"instanceGroups": [
{
"instanceGroupId": "ig-0XX00XX0X0XXX",
"instanceGroupName": "Master - 1",
"instanceRole": "Master",
"marketType": "OnDemand",
"instanceType": "m5.xlarge",
"requestedInstanceCount": 1
},
{
"instanceGroupId": "ig-000X0XXXXXXX",
"instanceGroupName": "Core - 2",
"instanceRole": "Core",
"marketType": "OnDemand",
"instanceType": "m5.xlarge",
"requestedInstanceCount": 1
}
]
NOTE: i've omitted the ID of the jobs using 0 where a number is expected and X where a ltter is expected.
You can do this programmatically in 3 lines of code, without having to copy-paste any of the specific information:
# get cluster id
CLUSTER_ID=$(aws emr list-clusters --active --query "Clusters[0].Id" --output text)
# get instance id
INSTANCE_ID=$(aws emr list-instances --cluster-id $CLUSTER_ID --instance-group-types MASTER --query "Instances[0].Ec2InstanceId" --output text)
# tag
aws ec2 create-tags --resources $INSTANCE_ID --tags Key=EMR,Value=MASTER
Below example is for Instance Fleet, it saves Cluster ID, Instance Fleet ID and Master IP as environment variables.
Replace cluster name "My-Cluster" to the actual cluster name.
export CLUSTER_ID=$(aws emr list-clusters --active --query 'Clusters[?Name==`My-Cluster`].Id' --output text)
export INSTANCE_FLEET=$(aws emr describe-cluster --cluster-id $CLUSTER_ID | jq -r '.[].InstanceFleets | .[] | select(.InstanceFleetType=="MASTER") | .Id')
export PRIVATE_IP=aws emr list-instances --cluster-id $CLUSTER_ID --instance-fleet-id $INSTANCE_FLEET --query 'Instances[*].PrivateIpAddress' --output text
"Cleanest" way:
aws emr list-clusters --active
Search for Master cluster ID (j-xxxxxxxxxxx), then
aws emr list-instances --region {your_region} --instance-group-types MASTER --cluster-id j-xxxxxxxxxxxxx
Immediately filters out the master instance(s) with it's information using --instance-group-types MASTER flag.
For tagging refers to the other answers with aws {resource} create-tags and --tag flag.
I am unable to set environment variables for my spark application. I am using AWS EMR to run a spark application. Which is more like a framework I wrote in python on top of spark, to run multiple spark jobs according to environment variables present. So in order for me to start the exact job, I need to pass the environment variable into the spark-submit. I tried several methods to do this. But none of them works. As I try to print the value of the environment variable inside the application it returns empty.
To run the cluster in the EMR I am using following AWS CLI command
aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Spark --ec2-attributes '{"KeyName":"<Key>","InstanceProfile":"<Profile>","SubnetId":"<Subnet-Id>","EmrManagedSlaveSecurityGroup":"<Group-Id>","EmrManagedMasterSecurityGroup":"<Group-Id>"}' --release-label emr-5.13.0 --log-uri 's3n://<bucket>/elasticmapreduce/' --bootstrap-action 'Path="s3://<bucket>/bootstrap.sh"' --steps file://./.envs/steps.json --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"c4.xlarge","Name":"Master"}]' --configurations file://./.envs/Production.json --ebs-root-volume-size 64 --service-role EMRRole --enable-debugging --name 'Application' --auto-terminate --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region <region>
Now Production.json looks like this:
[
{
"Classification": "yarn-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"FOO": "bar"
}
}
]
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.executor.memory": "2800m",
"spark.driver.memory": "900m"
}
}
]
And steps.json like this :
[
{
"Name": "Job",
"Args": [
"--deploy-mode","cluster",
"--master","yarn","--py-files",
"s3://<bucket>/code/dependencies.zip",
"s3://<bucket>/code/__init__.py",
"--conf", "spark.yarn.appMasterEnv.SPARK_YARN_USER_ENV=SHAPE=TRIANGLE",
"--conf", "spark.yarn.appMasterEnv.SHAPE=RECTANGLE",
"--conf", "spark.executorEnv.SHAPE=SQUARE"
],
"ActionOnFailure": "CONTINUE",
"Type": "Spark"
}
]
When I try to access the environment variable inside my __init__.py code, it simply prints empty. As you can see I am running the step using spark with yarn cluster in cluster mode. I went through these links to reach this position.
How do I set an environment variable in a YARN Spark job?
https://spark.apache.org/docs/latest/configuration.html#environment-variables
https://spark.apache.org/docs/latest/configuration.html#runtime-environment
Thanks for any help.
Use classification yarn-env to pass environment variables to the worker nodes.
Use classification spark-env to pass environment variables to the driver, with deploy mode client. When using deploy mode cluster, use yarn-env.
(Dear moderator, if you want to delete the post, let me know why.)
To work with EMR clusters I work using the AWS Lambda, creating a project that build an EMR cluster when a flag is set in the condition.
Inside this project, we define the variables that you can set in the Lambda and then, replace this to its value. To use this, we have to use the AWS API. The possible method you have to use is the AWSSimpleSystemsManagement.getParameters.
Then, make a map like val parametersValues = parameterResult.getParameters.asScala.map(k => (k.getName, k.getValue)) to have a tuple with its name and value.
Eg: ${BUCKET} = "s3://bucket-name/
What this means, you only have to write in your JSON ${BUCKET} instead all the name of your path.
Once you have replace the value, the step JSON can have a view like this,
[
{
"Name": "Job",
"Args": [
"--deploy-mode","cluster",
"--master","yarn","--py-files",
"${BUCKET}/code/dependencies.zip",
"${BUCKET}/code/__init__.py",
"--conf", "spark.yarn.appMasterEnv.SPARK_YARN_USER_ENV=SHAPE=TRIANGLE",
"--conf", "spark.yarn.appMasterEnv.SHAPE=RECTANGLE",
"--conf", "spark.executorEnv.SHAPE=SQUARE"
],
"ActionOnFailure": "CONTINUE",
"Type": "Spark"
}
]
I hope this can help you to solve your problem.
Based on my research, I've found that the only way to create using CLI an Aurora cluster with instances inside it from an existing snapshot is to follow these steps:
1) Create snapshot from existing cluster
2) Launch cluster from snapshot
3) Add instance into cluster
Thus, the commands I ran using the most-up-to-date AWS CLI version are these (along with the outputs):
aws rds create-db-cluster-snapshot \
--db-cluster-snapshot-identifier analytics-replica-db \
--db-cluster-identifier prodcluster
which outputs
{
"DBClusterSnapshot": {
"Engine": "aurora",
"SnapshotCreateTime": "2017-07-24T15:08:12.836Z",
"VpcId": "vpc-ID",
"DBClusterIdentifier": "cluster_name",
"DBClusterSnapshotArn": "arn:aws:rds:eu-west-1:aws_account:cluster-snapshot:analytics-replica-db",
"MasterUsername": "db_username",
"LicenseModel": "aurora",
"Status": "creating",
"PercentProgress": 0,
"DBClusterSnapshotIdentifier": "analytics-replica-db",
"IAMDatabaseAuthenticationEnabled": false,
"ClusterCreateTime": "2016-04-14T11:10:02.413Z",
"StorageEncrypted": false,
"AllocatedStorage": 1,
"EngineVersion": "5.6.10a",
"SnapshotType": "manual",
"AvailabilityZones": [
"eu-west-1a",
"eu-west-1b",
"eu-west-1c"
],
"Port": 0
}
}
After that, I create the cluster using this:
aws rds restore-db-cluster-from-snapshot \
--db-cluster-identifier analytics-replica-cluster \
--snapshot-identifier analytics-replica-db \
--engine aurora \
--port 3306 \
--db-subnet-group-name this_is_a_subnet_group \
--database-name this_is_the_database_name_equal_to_original_cluster_db \
--vpc-security-group-ids this_is_a_random_security_group \
--no-enable-iam-database-authentication
which outputs
{
"DBCluster": {
"MasterUsername": "this_is_the_same_username_as_the_one_on_original_db",
"ReaderEndpoint": "this_is_the_new_RDS_endpoint_of_cluster",
"ReadReplicaIdentifiers": [],
"VpcSecurityGroups": [
{
"Status": "active",
"VpcSecurityGroupId": "this_is_that_security_group"
}
],
"HostedZoneId": "Z29XKXDKYMONMX",
"Status": "creating",
"MultiAZ": false,
"PreferredBackupWindow": "23:50-00:20",
"DBSubnetGroup": "this_is_a_subnet_group",
"AllocatedStorage": 1,
"BackupRetentionPeriod": 10,
"PreferredMaintenanceWindow": "fri:03:34-fri:04:04",
"Engine": "aurora",
"Endpoint": "this_is_the_new_RDS_endpoint_of_reader",
"AssociatedRoles": [],
"IAMDatabaseAuthenticationEnabled": false,
"ClusterCreateTime": "2017-07-24T15:11:07.003Z",
"EngineVersion": "5.6.10a",
"DBClusterIdentifier": "analytics-replica-cluster",
"DbClusterResourceId": "cluster-resource_id",
"DBClusterMembers": [],
"DBClusterArn": "arn:aws:rds:eu-west-1:aws_account:cluster:analytics-replica-cluster",
"StorageEncrypted": false,
"DatabaseName": "this_is_the_database_name_equal_to_original_cluster_db",
"DBClusterParameterGroup": "default.aurora5.6",
"AvailabilityZones": [
"eu-west-1a",
"eu-west-1b",
"eu-west-1c"
],
"Port": 3306
}
}
And now, all I want to do is just run this
aws rds create-db-instance \
--db-name this_is_the_database_name_equal_to_original_cluster_db \
--db-instance-identifier analytics-replica-instance \
--db-instance-class "db.r3.large" \
--publicly-accessible \
--no-enable-iam-database-authentication \
--db-cluster-identifier analytics-replica-cluster \
--engine aurora
which outputs
An error occurred (InvalidParameterCombination) when calling the CreateDBInstance operation: The requested DB Instance will be a member of a DB Cluster. Set database name for the DB Cluster.
Can someone PLEASE tell me why does it hate me?
For anyone that is facing the same problem, some options are not available when you add a new instance to aurora cluster.
The error is tricky at the beginning, but it easy to understand. The last part, Set database name for the DB Cluster. indicates the real problem, and most of the time, you need to delete the property.
Another example An error occurred (InvalidParameterCombination) when calling the CreateDBInstance operation: The requested DB Instance will be a member of a DB Cluster. Set backup retention period for the DB Cluster. you need to remove --backup-retention-period