How do I set multiple --conf table parameters in AWS Glue? - amazon-web-services

Multiple Answers on stackoverflow for AWS Glue say to set the --conf table parameter. However, sometimes in a job we'll need to set multiple --conf key value pairs in 1 job.
I've tried the following ways to have multiple --conf values set all resulting in error:
add another table parameter called --conf. This results in the AWS Dashboard removing the 2nd parameter named --conf and sets focus to the value of the 1st parameter named --conf. Terraform also just considers both table parameters with key --conf to be equal and overwrites the value in the 1st parameter with the 2nd's value.
separate the config key value parameters with a space in the value of the table --conf parameter. E.G. spark.yarn.executor.memoryOverhead=1024 spark.yarn.executor.memoryOverhead=7g spark.yarn.executor.memory=7g. This results in a failure to start the job.
separate the config key value parameters with a comma in the value of the table --conf parameter. E.G. spark.yarn.executor.memoryOverhead=1024, spark.yarn.executor.memoryOverhead=7g, spark.yarn.executor.memory=7g. This results in a failure to start the job.
set the value of the --conf to have --conf string separate each key value. E.G. spark.yarn.executor.memoryOverhead=1024 --conf spark.yarn.executor.memoryOverhead=7g --conf spark.yarn.executor.memory=7g. This results in the glue job hanging.
How do I set multiple --conf table parameters in AWS Glue?

You can pass multiple parameters as below:
Key: --conf
value: spark.yarn.executor.memoryOverhead=7g --conf spark.yarn.executor.memory=7g
This has worked for me.

You can override the parameters by editing the job and adding job parameters. The key and value I used are here:
Key: --conf
Value: spark.yarn.executor.memoryOverhead=7g
This seemed counterintuitive since the setting key is actually in the value, but it was recognized. So if you're attempting to set spark.yarn.executor.memory the following parameter would be appropriate:
Key: --conf
Value: spark.yarn.executor.memory=7g
Find more information(I've add this answer from this): https://stackoverflow.com/a/50122948/10968161

Related

AWS CLI cloudwatch log subscription without any filter pattern

I'm trying to get logs from an application running in AWS, including all logs without any filter pattern:
aws logs put-subscription-filter --log-group-name "abc-loggroup" --filter-name "Destination" --filter-pattern "" --destination-arn "arn:test" --role-arn "arn:role"
getting following error -
aws: error: argument --filter-pattern: expected one argument
Tried running the command without "--filter-name" parameter it failed.
How can I run this command without any filter pattern
As per the documentation PutSubscriptionFilter, you need to provide both params.
filterPattern
A filter pattern for subscribing to a filtered stream of log events.
Type: String
Length Constraints: Minimum length of 0. Maximum length of 1024.
Required: Yes
filterName
Type: String
Length Constraints: Minimum length of 1. Maximum length of 512.
Pattern: [^:]
Required: Yes

AWS CLI DynamoDB Called From Powershell Put-Item fails when a value contains a space

So, let's say I'm trying to post this JSON via the command line (not in a file because I'm not going to write a file for every invocation of this script) to a dynamo DB table
{\"TeamId\":{\"S\":\"One_Space_123\"},\"TeamName\":{\"S\":\"One_Space\"},\"Environment\":{\"S\":\"cte\"},\"StartDate\":{\"S\":\"null\"},\"EndDate\":{\"S\":\"null\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someones user\"},\"EmailDistributionList\":{\"S\":\"test#test.com\"},\"RemedyGroup\":{\"S\":\"OneSpace\"},\"ScomSubscriptionId\":{\"S\":\"guid-ab22-2345\"},\"ZabbixActionId\":{\"S\":\"11\"},\"SnsTopic\":{\"M\":{\"TopicName\":{\"S\":\"ATopicName\"},\"TopicArn\":{\"S\":\"AtopicArn1234\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someones user\"}}}}
Then the result from the CLI is one like this:
Unknown options: Space"},"ScomSubscriptionId":{"S":"guid-ab22-2345"},"ZabbixActionId":{"S":"11"},"SnsTopic":{"M":{"TopicName":{"S":"ATopicName"},"TopicArn":{"S":"AtopicArn1234"},"CreatedDate":{"S":"today"},"CreatedBy":{"S":"someones, user"}}}}, user"},"EmailDistributionList":{"S":"test#test.com"},"RemedyGroup":{"S":"One
As you can see, it fails on the TeamName property that in the above example is "One Space". If I change that value to "OneSpace" then instead it starts to fail on the "CreatedBy" property that is populated by "someones user" but if I remove all spaces from all properties I can suddenly pass this json to dynamoDB successfully.
In a working example the json looks like this:
{\"TeamId\":{\"S\":\"One_Space_123\"},\"TeamName\":{\"S\":\"One_Space\"},\"Environment\":{\"S\":\"cte\"},\"StartDate\":{\"S\":\"null\"},\"EndDate\":{\"S\":\"null\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someonesuser\"},\"EmailDistributionList\":{\"S\":\"test#test.com\"},\"RemedyGroup\":{\"S\":\"OneSpace\"},\"ScomSubscriptionId\":{\"S\":\"guid-ab22-2345\"},\"ZabbixActionId\":{\"S\":\"11\"},\"SnsTopic\":{\"M\":{\"TopicName\":{\"S\":\"ATopicName\"},\"TopicArn\":{\"S\":\"AtopicArn1234\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someonesuser\"}}}}
I can't find any documentation that tells me I can't have spaces, if I read this in from a file it will post it with the spaces, so what gives? If anyone has any advice on this matter, I certainly appreciate it.
For what it's worth in Powershell the execution looks like this currently (though I've tried various combinations of quoting the $dbTeamTableEntry variable
$dbEntry = aws.exe dynamodb put-item --region $region --table-name $table --item "$($dbTeamTableEntry)"

How to modify multiline parameters in CloudFormation?

I created a CloudFormation stack using the ECS wizard. I want to customize some UserData entries to modify some parameters. However, as you can see in the picture, the parameters that must appear in multiple lines are shown in one line. Checking the current parameter, it is applied to multiple lines. After modifying this in the web UI, the UserData parameter is changed to one line, so the script does not work. Is there a way to update the values normally?
Unfortunately, the CloudFormation console does not currently support inputting multiline parameters
There are a couple workarounds:
The aws cli supports multiline parameters: --parameters ParameterKey=<>,ParameterValue='line 1
line 2'
Removing the Fn::Base64: function from UserData in the template and passing the already Base64 encoded string as the parameter should remove the need for that parameter being multiline
Avoiding explicitly inputting parameter values if using the default or previous value of a parameter is desired: --parameters ParameterKey=<>,UsePreviousValue=true

Is there a way to set multiple --conf as job parametet in AWS Glue?

Im trying to configure spark in my Glue jobs. When I tried to input them one by one in the 'Edit job', 'Job Parameters' as key and valur pair (e.g. key:--conf value: spark.executor.memory=10g) it works but when I tried putting them altogether (delimited by space or comma), it results to an error. I also tried using sc._conf.setAll but Glue is ignoring the config and insists on using its default. Is there a way to do this with Spark 2.4?
Yes, you can pass multiple parameters as below:
Key: --conf
value: spark.yarn.executor.memoryOverhead=7g --conf spark.yarn.executor.memory=7g

Pass comma separated argument to spark jar in AWS EMR using CLI

I am using aws cli to create EMR cluster and adding a step. My create cluster command looks like :
aws emr create-cluster --release-label emr-5.0.0 --applications Name=Spark --ec2-attributes KeyName=*****,SubnetId=subnet-**** --use-default-roles --bootstrap-action Path=$S3_BOOTSTRAP_PATH --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=$instanceCount,InstanceType=m4.4xlarge --steps Type=Spark,Name="My Application",ActionOnFailure=TERMINATE_CLUSTER,Args=[--master,yarn,--deploy-mode,client,$JAR,$inputLoc,$outputLoc] --auto-terminate
$JAR - is my spark jar which takes two params input and output
$input is basically a comma separated list of input files like s3://myBucket/input1.txt,s3://myBucket/input2.txt
However, aws cli command treats comma separated values as separate arguments and hence my second parameter is being treated as second parameter and hence the $output here becomes s3://myBucket/input2.txt
Is there any way to escape comma and treat this whole argument as single value in CLI command so that spark can handle reading multiple files as input?
Seems like there is no possible way of escaping comma from input files.
After trying quite a few ways, I finally had to put a hack by passing a delimiter for separating input files and handling the same in code. In my case,I added % as my delimiter and in Driver code, I am doing
if (inputLoc.contains("%")) {
inputLoc = inputLoc.replaceAll("%", ",");
}