AWS Data Pipeline Escaping Comma in emr activity step section

AWS Data Pipeline Escaping Comma in emr activity step section - amazon-web-services

I am creating an aws datapipline using the architect provided in the aws web console.
Everything is setup ok, my emrcluster is configured and successfully started.
But when I am trying to submit a emr activity I come across following problem:
In the step section of the emr activity my requirement is to provide --packages argument with 3 packages
But as far as I understand steps in emractivity is a comma separated value and commas (,) are replaced with spaces in the resultant step argument.
On the other hand --packages argument is also a comma separated value in case of multiple packages.
Now when I am trying to pass this as argument commas get replaced with spaces that make the step invalid.
This is the statement I required as it is in the resultant emr step:
--packages com.amazonaws:aws-java-sdk-s3:1.11.228,org.apache.hadoop:hadoop-aws:2.6.0,org.postgresql:postgresql:42.1.4
Any solution to escape the comma?
So far i try the \\\\ way as mentioned in http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emractivity.html
Not worked.

when u will be using \\\\, it will escape the slashes and comma will get replaced.
You can try using Three slashes, same has worked for me . Like \\\, .
I hope that works

Related

How do I specify multiple shell scripts as initialization actions for Dataproc cluster creation?

Google's documentation says that --initialization-actions takes a list of GCS URLs. If I specify one:
--initialization-actions 'gs://my-project/myscript.sh'
This works fine.
--initialization-actions 'gs://my-project/myscript.sh', 'gs://my-project/myscript2.sh'
Gives the following error:
INVALID_ARGUMENT: Google Cloud Storage object does not exist 'gs://my-project/myscript.sh gs://my-project/myscript2.sh'
Same without quotes, and with or without a space after the comma.
I tried encapsulating in square brackets:
--initialization-actions ['gs://my-project/myscript.sh', 'gs://my-project/myscript2.sh']
And the error this time is:
Executable '['gs://my-project/myscript.sh', 'gs://my-project/myscript2.sh']' URI must begin with 'gs://'
I can confirm one million percent that the paths I am using are valid, and that both objects are valid shell scripts. Is there something obvious I am missing?

You should remove the space between the scripts:
--initialization-actions gs://my-project/myscript.sh,gs://my-project/myscript2.sh

Just figured it out, the format needs to be:
--initialization-actions 'gs://my-project/myscript.sh, gs://my-project/myscript2.sh'
ie both scripts in a single set of quotes, with comma separating.

Creating Kinesis Analytics applications using aws cli

I want to create a kinesis analytics application using aws cli. I use this command to create the application
aws kinesisanalytics create-application --application-name smartfactorytest1 --application-code "CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM ( "device_serial" VARCHAR(16), "uploadRate" INTEGER, "downloadRate" INTEGER);
CREATE OR REPLACE PUMP "STREAM_PUMP"
AS INSERT INTO DESTINATION_SQL_STREAM
SELECT STREAM "device_serial", "uploadRate", "downloadRate"
FROM SOURCE_SQL_STREAM_001
-- LIKE compares a string to a string pattern (_ matches all char, % matches substring)
-- SIMILAR TO compares string to a regex, may use ESCAPE
WHERE "uploadRate" >20000" --inputs NamePrefix="SOURCE_SQL_STREAM",KinesisStreamsInput={ResourceARN="sourcearn",RoleARN="rolearn"}
But I get this error
invalid type for parameter Inputs[0].KinesisStreamsInput, value: ResourceARN=string, type: <class 'str'>, valid types: <class 'dict'>
Can anyone tell me what am I doing wrong? Any help would be appreciated.

I believe the issue is either that you need to take the quotes out in the KinesisStreamsInput section, or you need to add quotes and escape them. The documentation is unclear on which is the correct option.
According to the AWS Kinesis Analytics CLI Reference, https://docs.aws.amazon.com/cli/latest/reference/kinesisanalytics/create-application.html, the syntax for --inputs with KinesisStreamsInput should look like the example provided for KinesisStreamsOutput:
Name=string,KinesisStreamsOutput={ResourceARN=string,RoleARN=string},...
This would mean removing the quotes around your sourcearn and rolearn. However, the documentation isn't clear that this refers to the CLI syntax in all cases.
If that doesn't work, according to this AWS CLI usage guide page, https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-parameters-quoting-strings.html, it specifies adding quotes and escaping the relevant ones, depending on your OS...
"Linux or macOS
Use single quotation marks (' ') to enclose the JSON data structure, as in the following example. You don't have to do anything special with the embedded double quotation marks embedded in the JSON string.
aws ec2 run-instances --image-id ami-12345678 --block-device-mappings '[{"DeviceName":"/dev/sdb","Ebs":{"VolumeSize":20,"DeleteOnTermination":false,"VolumeType":"standard"}}]'
PowerShell
PowerShell requires single quotation marks (' ') to enclose the JSON data structure. Also, because double quotation marks have a special meaning to PowerShell, you must use a backslash () to escape each double quotation mark (") within the JSON structure, as in the following example.
PS C:\> aws ec2 run-instances --image-id ami-12345678 --block-device-mappings '[{\"DeviceName\":\"/dev/sdb\",\"Ebs\":{\"VolumeSize\":20,\"DeleteOnTermination\":false,\"VolumeType\":\"standard\"}}]'
Windows Command Prompt
The Windows command prompt requires double quotation marks (" ") to enclose the JSON data structure. Also, to prevent the command processor from misinterpreting the double quotation marks embedded in the JSON, you must also escape (precede with a backslash [ \ ] character) each double quotation mark (") within the JSON data structure itself, as in the following example.
C:\> aws ec2 run-instances --image-id ami-12345678 --block-device-mappings "[{\"DeviceName\":\"/dev/sdb\",\"Ebs\":{\"VolumeSize\":20,\"DeleteOnTermination\":false,\"VolumeType\":\"standard\"}}]"
Only the outermost double quotation marks are not escaped."
This link also references needing to escape quotes on Windows, and is using the kinesisanalytics command: https://github.com/aws/aws-cli/issues/3103
"Rishi74744 commented on Feb 6, 2018
I got it to work as -
aws kinesisanalytics add-application-reference-data-source --endpoint https://kinesisanalytics.us-east-1.amazonaws.com --region us-east-1 --application-name alerts --reference-data-source "{\"TableName\":\"DeviceData\",\"S3ReferenceDataSource\":{\"BucketARN\":\"arn: aws: s3: : : bucket-name\",\"FileKey\":\"device.csv\",\"ReferenceRoleARN\":\"arn: aws: iam: : account-id: role/role-name\"},\"ReferenceSchema\":{\"RecordFormat\":{\"RecordFormatType\":\"CSV\",\"MappingParameters\":{\"CSVMappingParameters\":{\"RecordRowDelimiter\":\"\n\",\"RecordColumnDelimiter\":\", \"}}},\"RecordEncoding\":\"UTF-8\",\"RecordColumns\":[{\"Name\":\"key1\",\"SqlType\":\"VARCHAR(64)\"},{\"Name\":\"key2\",\"SqlType\":\"VARCHAR(64)\"}]}}" --current-application-version-id 2
But this should be mentioned in the documentation."
One note: it may be preferable to use JSON files as inputs and use this syntax instead: --cli-input-json file://input.json. This is referenced in the AWS Kinesis CLI Command Reference (first link, under 1.) and also mentioned in the GitHub link above. It's also the method used by the majority of the AWS Kinesis documentation. For example, JSON files used for different purposes in Kinesis Analytics:
https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html
Please let me know what works, and I will work with my AWS rep to improve the documentation.

AWS CloudFormation keys not accepting special characters

I have noticed that AWS CloudFormation does not like special characters.
When I update a key:value pair in our pipeline.yml file with special char
e.g. PAR_FTP_PASS: ^XoN*H89Ie!rhpl!wan=Jcyo6mo, I see the following error:
parameters[5] ParameterKey, ParameterValue or UsePreviousValue expected
I am able to update the value through the AWS CloudFormation UI.
It seems like the issue is to do with AWS CloudFOrmation parsing the yml file.
Is there a workaround with this issue?

AWS Tags have some restrictions on what they can contain, see here:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-restrictions
A key note which can catch people out is: "Although EC2 allows for any character in its tags, other services are more restrictive. The allowed characters across services are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / #."
So I'd check if the service you are adding this onto can support that string.

Escaping Terraform String Properly

How can I properly escape a Terraform string for trying to be interpolated that contains double curly braces? I'm reading a json file using templating and it keeps failing on this issue.
"customInventory": "{{ customInventory }}"
I want to keep the double braces. Nothing works so far and this is preventing the correct passing of this value to an Amazon Web Services Ssm doc. The Terraform documentation doesn't provide much insight other than escaping quotes and dollar signs.
I've tried Unicode values, double braces, backslashes and other permutations without any success.

This syntax is AWS Ssm doc parameter syntax. The error was actually not terraform but AWS reporting back invalid input when attempting to create the doc. Changing to Enabled instead of {{ customInventory }} resolved the issue and allowed me to publish the doc.

Pass comma separated argument to spark jar in AWS EMR using CLI

I am using aws cli to create EMR cluster and adding a step. My create cluster command looks like :
aws emr create-cluster --release-label emr-5.0.0 --applications Name=Spark --ec2-attributes KeyName=*****,SubnetId=subnet-**** --use-default-roles --bootstrap-action Path=$S3_BOOTSTRAP_PATH --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=$instanceCount,InstanceType=m4.4xlarge --steps Type=Spark,Name="My Application",ActionOnFailure=TERMINATE_CLUSTER,Args=[--master,yarn,--deploy-mode,client,$JAR,$inputLoc,$outputLoc] --auto-terminate
$JAR - is my spark jar which takes two params input and output
$input is basically a comma separated list of input files like s3://myBucket/input1.txt,s3://myBucket/input2.txt
However, aws cli command treats comma separated values as separate arguments and hence my second parameter is being treated as second parameter and hence the $output here becomes s3://myBucket/input2.txt
Is there any way to escape comma and treat this whole argument as single value in CLI command so that spark can handle reading multiple files as input?

Seems like there is no possible way of escaping comma from input files.
After trying quite a few ways, I finally had to put a hack by passing a delimiter for separating input files and handling the same in code. In my case,I added % as my delimiter and in Driver code, I am doing
if (inputLoc.contains("%")) {
inputLoc = inputLoc.replaceAll("%", ",");
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS Data Pipeline Escaping Comma in emr activity step section - amazon-web-services

when u will be using \\\\, it will escape the slashes and comma will get replaced. You can try using Three slashes, same has worked for me . Like \\\, . I hope that works

Related

How do I specify multiple shell scripts as initialization actions for Dataproc cluster creation?

Creating Kinesis Analytics applications using aws cli

AWS CloudFormation keys not accepting special characters

Escaping Terraform String Properly

Pass comma separated argument to spark jar in AWS EMR using CLI

Categories

Resources