I want to dynamically increment my environment variables (dates) for my AWS datapipeline and was wondering if someone has achieved this through ShellCommandActivity by changing the config.json file?
{
"values": ..{}
}
Not sure what you are trying to achieve. You can use nested expressions in your pipeline definition anywhere;
#{format(#scheduledStartTime, 'YYYY-MM-dd')}
E.g. as parameters you can use in your "command":
"parameters": [
{
"id": "myDate",
"type" : "DateTime"
}
],
"values": {
"myDate": "#{minusDays(myDateTime,1)}"
}
Or get a date as part of the shell command that is being executed:
date -v -1d
More info:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-pipeline-expressions.html
Related
I am trying to run a script in the cfn-init command but it keeps timing out.
What am I doing wrong when running the startup-script.sh?
"WebServerInstance" : {
"Type" : "AWS::EC2::Instance",
"DependsOn" : "AttachGateway",
"Metadata" : {
"Comment" : "Install a simple application",
"AWS::CloudFormation::Init" : {
"config" : {
"files": {
"/home/ec2-user/startup_script.sh": {
"content": {
"Fn::Join": [
"",
[
"#!/bin/bash\n",
"aws s3 cp s3://server-assets/startserver.jar . --region=ap-northeast-1\n",
"aws s3 cp s3://server-assets/site-home-sprint2.jar . --region=ap-northeast-1\n",
"java -jar startserver.jar\n",
"java -jar site-home-sprint2.jar --spring.datasource.password=`< password.txt` --spring.datasource.username=`< username.txt` --spring.datasource.url=`<db_url.txt`\n"
]
]
},
"mode": "000755"
}
},
"commands": {
"start_server": {
"command": "./startup_script.sh",
"cwd": "~",
}
}
}
}
},
The file part works fine and it creates the file but it times out at running the command.
What is the correct way of executing a shell script?
You can tail the logs in /var/log/cfn-init.log and detect the issues while running the script.
The commands in Cloudformation Init are ran as sudo user by default. Maybe there can be an issue were your script is residing in /home/ec2-user/ and you are trying to run the script from '~' (i.e. /root).
Please give the absolute path (/home/ec2-user) in cwd. It will solve your concern.
However, the exact issue can be fetched from the logs only.
Usually the init scripts are executed by root unless specified otherwise. Can you try giving the full path while running your startup script. You can give cloudkast a try. It is an online cloudformation template generator. Makes easier creating objects such as aws::cloudformation::init.
I want to tune my spark cluster on AWS EMR and I couldn't change the default value of spark.driver.memory which leads every spark application to crash as my dataset is big.
I tried editing the spark-defaults.conf file manually on the master machine, and I also tried configuring it directly with a JSON file on EMR dashboard while creating the cluster.
Here's the JSON file used:
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.driver.memory": "7g",
"spark.driver.cores": "5",
"spark.executor.memory": "7g",
"spark.executor.cores": "5",
"spark.executor.instances": "11"
}
}
]
After using the JSON file, the configurations are correctly found in the "spark-defaults.conf" but on spark dashboard there's always the default value for "spark.driver.memory" of 1000M while the other values are modified correctly. Anyone have got into the same problem please?
Thank you in advance.
You need to set
maximizeResourceAllocation=true
in the spark-defaults settings
[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]
I am unable to set environment variables for my spark application. I am using AWS EMR to run a spark application. Which is more like a framework I wrote in python on top of spark, to run multiple spark jobs according to environment variables present. So in order for me to start the exact job, I need to pass the environment variable into the spark-submit. I tried several methods to do this. But none of them works. As I try to print the value of the environment variable inside the application it returns empty.
To run the cluster in the EMR I am using following AWS CLI command
aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Spark --ec2-attributes '{"KeyName":"<Key>","InstanceProfile":"<Profile>","SubnetId":"<Subnet-Id>","EmrManagedSlaveSecurityGroup":"<Group-Id>","EmrManagedMasterSecurityGroup":"<Group-Id>"}' --release-label emr-5.13.0 --log-uri 's3n://<bucket>/elasticmapreduce/' --bootstrap-action 'Path="s3://<bucket>/bootstrap.sh"' --steps file://./.envs/steps.json --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"c4.xlarge","Name":"Master"}]' --configurations file://./.envs/Production.json --ebs-root-volume-size 64 --service-role EMRRole --enable-debugging --name 'Application' --auto-terminate --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region <region>
Now Production.json looks like this:
[
{
"Classification": "yarn-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"FOO": "bar"
}
}
]
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.executor.memory": "2800m",
"spark.driver.memory": "900m"
}
}
]
And steps.json like this :
[
{
"Name": "Job",
"Args": [
"--deploy-mode","cluster",
"--master","yarn","--py-files",
"s3://<bucket>/code/dependencies.zip",
"s3://<bucket>/code/__init__.py",
"--conf", "spark.yarn.appMasterEnv.SPARK_YARN_USER_ENV=SHAPE=TRIANGLE",
"--conf", "spark.yarn.appMasterEnv.SHAPE=RECTANGLE",
"--conf", "spark.executorEnv.SHAPE=SQUARE"
],
"ActionOnFailure": "CONTINUE",
"Type": "Spark"
}
]
When I try to access the environment variable inside my __init__.py code, it simply prints empty. As you can see I am running the step using spark with yarn cluster in cluster mode. I went through these links to reach this position.
How do I set an environment variable in a YARN Spark job?
https://spark.apache.org/docs/latest/configuration.html#environment-variables
https://spark.apache.org/docs/latest/configuration.html#runtime-environment
Thanks for any help.
Use classification yarn-env to pass environment variables to the worker nodes.
Use classification spark-env to pass environment variables to the driver, with deploy mode client. When using deploy mode cluster, use yarn-env.
(Dear moderator, if you want to delete the post, let me know why.)
To work with EMR clusters I work using the AWS Lambda, creating a project that build an EMR cluster when a flag is set in the condition.
Inside this project, we define the variables that you can set in the Lambda and then, replace this to its value. To use this, we have to use the AWS API. The possible method you have to use is the AWSSimpleSystemsManagement.getParameters.
Then, make a map like val parametersValues = parameterResult.getParameters.asScala.map(k => (k.getName, k.getValue)) to have a tuple with its name and value.
Eg: ${BUCKET} = "s3://bucket-name/
What this means, you only have to write in your JSON ${BUCKET} instead all the name of your path.
Once you have replace the value, the step JSON can have a view like this,
[
{
"Name": "Job",
"Args": [
"--deploy-mode","cluster",
"--master","yarn","--py-files",
"${BUCKET}/code/dependencies.zip",
"${BUCKET}/code/__init__.py",
"--conf", "spark.yarn.appMasterEnv.SPARK_YARN_USER_ENV=SHAPE=TRIANGLE",
"--conf", "spark.yarn.appMasterEnv.SHAPE=RECTANGLE",
"--conf", "spark.executorEnv.SHAPE=SQUARE"
],
"ActionOnFailure": "CONTINUE",
"Type": "Spark"
}
]
I hope this can help you to solve your problem.
I have created instance using cloudformation template, configured it with userdata configuration and powershell dsc. I have created AMI for this instance so that next time it speeds up my stack creation.
Now how can i use the this AMI in that same template so it bypass all the configurations & installation done on instance and directly sends success signal to waithandler.
I am trying this in my template but it is failing.
Thanks in Advance,
Lokesh Jangir
It sounds like you need a check in your user data to see if everything is already configured, and if it is, then you just stop and send the notification instead of setting it back up again.
Ultimately, it sounds like it'd be easier to have two templates - one to create the AMI, and one to re-use it in other settings. The second template could take the AMI ID as a parameter so that it is more flexible and can be used with different AMIs as you create them.
1. To use your AMI id in the cloudformation template, start with adding a parameter, so that you can easily change it:
`
"Parameters": {
...
"amiId": {
"Type": "String",
"Default": "ami-073bb070",
"AllowedPattern": "[a-zA-Z0-9\\-]*",
"Description": "Only [a-zA-Z0-9\\-]* allowed."
},
...
}
2. Use that param in a LaunchConfig:
`
"aLaunchConfig": {
"Type": "AWS::AutoScaling::LaunchConfiguration",
"Properties": {
"ImageId": { "Ref" : "amiId" },
...
3. or use it directly in an EC2 instance:
`
"someEC2": {
"Type": "AWS::EC2::Instance",
"Properties": {
"ImageId": { "Ref" : "amiId" },
The command I use:
aws s3api put-bucket-notification-configuration --bucket bucket-name --notification-configuration file:///Users/chris/event_config.json
Works fine if I take out the "Filter" key. As soon as I add it in, I get:
Parameter validation failed:
Unknown parameter in NotificationConfiguration.LambdaFunctionConfigurations[0]: "Filter", must be one of: Id, LambdaFunctionArn, Events
Here's my JSON file:
{
"LambdaFunctionConfigurations": [
{
"LambdaFunctionArn": "arn:aws:lambda:us-east-1:000000000:function:name",
"Events": [
"s3:ObjectCreated:*"
],
"Filter": {
"Key": {
"FilterRules": [
{
"Name": "prefix",
"Value": "images/"
}
]
}
}
}
]
}
When I look at the command's docs (http://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-notification-configuration.html), I don't see any mistake. I've tried copy/pasting, carefully looking over, etc... Any help would be greatly appreciated!
You need to be running at least version 1.7.46 of aws-cli, released 2015-08-20.
This release adds Amazon S3 support for event notification filters and fixes some issues.
https://aws.amazon.com/releasenotes/CLI/3585202016507998
The aws-cli utility contains a lot of built-in intelligence and validation logic. New features often require the code in aws-cli to be updated, and Filter on S3 event notifications is a relatively recent feature.
See also: https://aws.amazon.com/blogs/aws/amazon-s3-update-delete-notifications-better-filters-bucket-metrics/