Multiple values in EMR Cluster Configuration template

Multiple values in EMR Cluster Configuration template - amazon-web-services

Within my EMR module I have a template that is deployed for the cluster configuration, within this template are all the cluster configuration requirements for the given classification type as specified in the variable emr_cluster_applications e.g. Spark, Hadoop, Hive.
Visual:
emr_cluster_applications = ["Spark", "Hadoop", "Hive"]
emr_cluster_configurations = file("./filepath/to/template.json")
This set up works fine however moving forward I'm wondering if the template can be populated based on the values within the emr_cluster_applications variable.
For example in a seperate deployment, if ["Spark", "Hadoop"] were specified as opposed to all three, then the template file would only use the corresponding Spark and Hadoop configuration with Hive being ignored although still present in the file - is this possible?
Update:
Template file:
[
{
"Classification": "spark",
"Properties":{
"maximizeResourceAllocation": "false",
"spark.executor.memoryOverhead": "4G"
}
},
{
"Classification": "hive",
"Properties":{
"javax.jdo.option.ConnectionURL": XXXX
"javax.jdo.option.ConnectionDriverName": XXXX
"javax.jdo.option.ConnectionUserName": XXXX
"javax.jdo.option.ConnectionPassword": XXXX
}
},
{
"Classification": "hbase-site",
"Properties": {
"hbase.rootdir": "XXXXXXXXXX"
}
},
{
"Classification": "hbase",
"Properties":{
"hbase.emr.storageMode": "s3"
"hbase.emr.readreplica.emnabled": "true"
}
}
]

This is the best I could come up with and there might be better solutions, so take this with a grain of salt. I have problems with mapping the Hadoop to two different elements from the JSON, so I had to do some modifications to the variables in order to make it work. I strongly suggest doing any variable manipulation within a locals block in order to avoid clutter in the resources. The locals.tf example:
locals {
emr_template = [
{
"Classification" : "spark",
"Properties" : {
"maximizeResourceAllocation" : "false",
"spark.executor.memoryOverhead" : "4G"
}
},
{
"Classification" : "hive",
"Properties" : {
"javax.jdo.option.ConnectionURL" : "XXXX",
"javax.jdo.option.ConnectionDriverName" : "XXXX",
"javax.jdo.option.ConnectionUserName" : "XXXX",
"javax.jdo.option.ConnectionPassword" : "XXXX"
}
},
{
"Classification" : "hbase-site",
"Properties" : {
"hbase.rootdir" : "XXXXXXXXXX"
}
},
{
"Classification" : "hbase",
"Properties" : {
"hbase.emr.storageMode" : "s3",
"hbase.emr.readreplica.emnabled" : "true"
}
}
]
emr_template_mapping = { for template in local.emr_template : template.Classification => template }
hadoop_enabled = false
hadoop = local.hadoop_enabled ? ["hbase", "hbase-site"] : []
apps_enabled = ["spark", "hive"]
emr_cluster_applications = concat(local.apps_enabled, local.hadoop)
}
You can manipulate which apps will be added with two options:
If the Hadoop is enabled, that means hbase and hbase-site need to be added to the list of the allowed apps. If it is not enabled, then the value of the hadoop variable will be an empty list.
In the apps_enabled local variable you decide which ones you want to enable, i.e., spark, hive, none, or both.
Finally, in the emr_cluster_applications local variable you would use concat to concatenate the two lists into one.
Then, to create a JSON file locally, you could use the local_file option:
resource "local_file" "emr_template_file" {
content = jsonencode([for app in local.emr_cluster_applications :
local.emr_template_mapping["${app}"] if contains(keys(local.emr_template_mapping), "${app}")
]
)
filename = "${path.root}/template.json"
}
The local_file will output a JSON encoded file which can be used where you need it. I am pretty sure there are better ways to do it, so maybe someone else will see this and give a better answer.

Related

AWS EventBridge Input transformation rule with array List

I have a event with an Arraylist :
I have an Arraylist :
"TelephoneDetails": {
"Telephone": [
{
"Number": "<Number>",
"Type": "<Type>",
"Primary": "<Primary>",
"TextEnabled": "<TextEnabled>"
},{
"Number": "<Number>",
"Type": "<Type>",
"Primary": "<Primary>",
"TextEnabled": "<TextEnabled>"
}
]
}
how to write the InputTransformer for the InputPath for this ? ,
i can get the Telephone[0]using this
{
"Type": "$.detail.payload.TelephoneDetails.Telephone[0].Type",
"Number": "$.detail.payload.TelephoneDetails.Telephone[0].Number",
"Primary": "$.detail.payload.TelephoneDetails.Telephone[0].Primary",
"TextEnabled": "$.detail.payload.TelephoneDetails.Telephone[0].TextEnabled"
}
not understanding how to write, if I have ArrayList of N?

Is simple as that:
"TextEnabled": "$.detail.payload.TelephoneDetails.Telephone[*].TextEnabled"
please note the [*] instead of [0] to let the template engine iterates over the list of Telephones

I think you can't do this with plain EB syntax. Probably the best way would be to put a lambda function as target of your EB rule, make the transformations and then forward it to your target.

Deriving IP Range and Netmask using Cloudformation and Pystache

So, I've got an interesting one - Cloudformation allows the use of Mustache templates (via Pystache) to build configuration files via AWS::CloudFormation::Init (They bury this a few paragraphs down, but it's there).
This is useful to me, as I need to write out some of the network details to create a config file for an OpenVPN server. So far, so good.
But here's where it gets tricky - AWS likes CIDR notation (and I need to use the same parameter for AWS resources and for this). But OpenVPN likes to use the older IP Range and Netmask format. I'm currently trying to find a good way to convert this. I can either use CloudFormation functions or try to find a way to do the transformation in Mustache.
I can get the IP Range using a combination of Fn::Select and Fn::Split to pull the first half of the CIDR, but deriving the netmask currently has me stumped.
Example so far
"/etc/openvpn/server/configname.conf" : {
"source" : {"Fn::Sub" :
[ "https://${ConfigBucket}.s3.amazonaws.com/Path/To/configname.conf.mustache"
, { "ConfigBucket" : { "Fn::ImportValue" : "ConfigBucket-Export-Name" }} ]
},
"context" : {
"VpnCIDR" : { "Ref" : "VpnCIDRRange"},
"VpnIPRange" : { "Fn::Select" : [ "0", {"Fn::Split" : ["/", { "Ref" : "VpnCIDRRange"}]}]},
"AwsCIDR" : { "Fn::ImportValue" : { "Fn::Sub" : "${VPCName}-VPC-CIDR" } },
"AwsIPRange" : { "Fn::Select" : [ "0", {"Fn::Split" : ["/", { "Fn::ImportValue" : { "Fn::Sub" : "${VPCName}-VPC-CIDR" }}]}]}
}
}

Ok, so I wound up solving this using a simple CloudFormation Macro that takes a CIDR range and returns a JSON object containing the CIDR, subnet and netmask - for example, given 192.168.1.0/24, it would return the following JSON fragment for including in a CloudFormation template
{
'CIDR' : '192.168.1.0/24',
'subnet' : '192.168.1.0',
'netmask' : '255.255.255.0'
}
The code in question is posted to this gist: https://gist.github.com/AdamLuchjenbroers/3165ab18bb0ee9da95ad6bf514f415e0
It also can take the name of a stack export instead (this was required as apparently Fn::ImportValue gets evaluated AFTER Fn::Transform) to enable cross-talk between stacks.
This can then be passed via the context key in the CloudFormation::Init files section like so:
"/etc/openvpn/server/openvpn.conf" : {
"source": {
"Fn::Sub": [
"https://${BucketName}.s3.${Region}.amazonaws.com/Path/To/openvpn.conf.mustache",
{
"BucketName": "BucketNameGoesHere",
"Region": { "Ref": "AWS::Region" }
}
]
},
"context": {
"Vpn": {
"Fn::Transform": {
"Name": "NetworkInfo",
"Parameters": {
"CIDR": { "Ref": "VpnCIDRRange" }
}
}
},
"Aws": {
"Fn::Transform": {
"Name": "NetworkInfo",
"Parameters": {
"CIDR-export": { "Fn::Sub": "${VPCName}-VPC-CIDR" }
}
}
}
}
},
And then referenced in the actual mustache template like so:
push "route {{Aws.subnet}} {{Aws.netmask}}"

How to use multiple global secondary indexes for app sync query?

I have three condition terms in where like condition. I have specified there indexes in the dynamo db table. I require a way specify all three indexes if that is a good practice or any other way to query based on the expression.
Also I want to know whether the expression is a valid one or not.
{
"version" : "2017-02-28",
"operation" : "Query",
"query" : {
## Also not sure about the query expression. Is it valid ?
"expression": "studentId = :studentId and (chapterId = :chapterId isUserAudio = :isUserAudio)",
"expressionValues" : {
":studentId" : {
"S" : "${ctx.args.studentId}"
},
":chapterId": {
"S": "${ctx.args.chapterId}"
},
":isUserAudio": {
"BOOL": "${ctx.args.isUserAudio}"
}
}
},
"index": "" # can multiple indexes be specified here
}

I believe you should be able to use a combination of query expressions and filter expressions to achieve your goal. Try changing your resolver to this:
{
"version" : "2017-02-28",
"operation" : "Query",
"query" : {
"expression": "studentId = :studentId",
"expressionValues" : {
":studentId" : {
"S" : "${ctx.args.studentId}"
}
}
},
"filter" : {
"expression": "chapterId = :chapterId AND isUserAudio = :isUserAudio",
"expressionValues" : {
":chapterId": {
"S": "${ctx.args.chapterId}"
},
":isUserAudio": {
"BOOL": "${ctx.args.isUserAudio}"
}
}
},
"index": "the-index-with-studentId-as-a-hashkey"
}
This will initially query the index and then with the results from the index will apply a filter to the values. Let me know if that works!
Hope this helps

You can only Query one table or one index at a time. It is not possible to execute one query that accesses more than one table or index. You will need to Query each index separately and combine the data in your application.
DynamoDB comparator guide is here. The expression is not valid. Maybe you want:
studentId = :studentId AND chapterId = :chapterId AND isUserAudio = :isUserAudio

Is it possible to create an array pipeline object in AWS datapipeline via Cloudformation?

When creating a data pipeline via API / CLI that creates an EmrCluster, I can specify multiple steps using an array structure:
{ "objects" : [
{ "id" : "myEmrCluster",
"terminateAfter" : "1 hours",
"schedule" : {"ref":"theSchedule"}
"step" : ["some.jar,-param1,val1", "someOther.jar,-foo,bar"] },
{ "id" : "theSchedule", "period":"1 days" }
] }
I can call put-pipeline-definition referencing the file above to create a number of steps for the EMR cluster.
Now if I want to create the pipeline using CloudFormation, I can use the PipelineObjects property in a AWS::DataPipeline::Pipeline resource type to configure the pipeline. However, pipeline objects can only be of type StringValue or RefValue. How can i create an array pipeline object field?
Here's a corresponding cloudformation template:
"Resources" : {
"MyEMRCluster" : {
"Type" : "AWS::DataPipeline::Pipeline",
"Properties" : {
"Name" : "MyETLJobs",
"Activate" : "true",
"PipelineObjects" : [
{
"Id" : "myEmrCluster",
"Fields" : [
{ "Key" : "terminateAfter","StringValue":"1 hours" },
{ "Key" : "schedule","RefValue" : "theSchedule" },
{ "Key" : "step","StringValue" : "some.jar,-param1,val1" }
]
},
{
"Id" : "theSchedule",
"Fields" : [
{ "Key" : "period","StringValue":"1 days" }
]
}
]
}
}
}
With the above template, step is a StringValue, equivalent to:
"step" : "some.jar,-param1,val1"
and not an array like the desired config.
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-datapipeline-pipeline-pipelineobjects-fields.html shows only StringValue and RefValue are valid keys - is it possible to create an array of steps via CloudFormation??
Thanks in advance.

Ah, I'm not sure where I saw that steps could be configured as an array - the documentation has no mention about that - instead, it specifies that to have multiple steps, multiple step entries should be used.
{
"Id" : "myEmrCluster",
"Fields" : [
{ "Key" : "terminateAfter","StringValue":"1 hours" },
{ "Key" : "schedule","RefValue" : "theSchedule" },
{ "Key" : "step","StringValue" : "some.jar,-param1,val1" },
{ "Key" : "step","StringValue" : "someOther.jar,-foo,bar" }
]
}
}

How to pass a list to a nested stack parameter in AWS CloudFormation?

Im using nested stack to create ELB and application stacks...And i need to pass list of subnets to ELB and Application stack...
And the main json has the below code...
"Mappings":{
"params":{
"Subnets": {
"dev":[
"subnet-1”,
"subnet-2”
],
"test":[
"subnet-3”,
"subnet-4”,
"subnet-5”,
"subnet-6”
],
"prod":[
"subnet-7”,
"subnet-8”,
"subnet-9”
]
}
}
},
"Parameters":{
"Environment":{
"AllowedValues":[
"prod",
"preprod",
"dev"
],
"Default":"prod",
"Description":"What environment type is it (prod, preprod, test, dev)?",
"Type":"String"
}
},
Resources:{
"ELBStack": {
"Type": "AWS::CloudFormation::Stack",
"Properties": {
"TemplateURL": {
"Fn::Join":[
"",
[
"https://s3.amazonaws.com/",
"myS3bucket",
"/ELB.json"
]
]
},
"Parameters": {
"Environment":{"Ref":"Environment"},
"ELBSHORTNAME":{"Ref":"ELBSHORTNAME"},
"Subnets":{"Fn::FindInMap":[
"params",
"Subnets",
{
"Ref":"Environment"
}
]},
"S3Bucket":{"Ref":"S3Bucket"},
},
"TimeoutInMinutes": "60"
}
}
now when i run this json using lambda or cloudformation i get the below error under cloudformation Events Tab....
CREATE_FAILED AWS::CloudFormation::Stack ELBStack Value of property Parameters must be an object with String (or simple type) properties
using below lambda
import boto3
import time
date = time.strftime("%Y%m%d")
time = time.strftime("%H%M%S")
stackname = 'FulfillSNSELB'
client = boto3.client('cloudformation')
response = client.create_stack(
StackName= (stackname + '-' + date + '-' + time),
TemplateURL='https://s3.amazonaws.com/****/**/myapp.json',
Parameters=[
{
'ParameterKey': 'Environment',
'ParameterValue': 'dev',
'UsePreviousValue': False
}]
)
def lambda_handler(event, context):
return(response)

You can't pass a list to a nested stack. You have to pass a concatenation of items with the intrinsic function Join like this: !Join ["separator", [item1, item2, …]].
In the nested stack, the type of the parameter needs to be List<Type>.

Your JSON is not well-formed. Running your JSON through aws cloudformation validate-template (or even jsonlint.com) quickly reveals several basic syntax errors:
Resources:{ requires the key to be surrounded by quotes: "Resources": {
Some of your quotation marks are invalid 'smart-quotes' "subnet-1”, that need to be replaced with standard ASCII quotes: "subnet-1",
(This is the one your error message refers to) The "Properties" object in your "ELBStack" resource "S3Object: {"Ref: "S3Bucket"}," has a trailing comma after its last element that needs to be removed: "S3Object: {"Ref: "S3Bucket"}"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Multiple values in EMR Cluster Configuration template - amazon-web-services

Related

AWS EventBridge Input transformation rule with array List

Deriving IP Range and Netmask using Cloudformation and Pystache

How to use multiple global secondary indexes for app sync query?

Is it possible to create an array pipeline object in AWS datapipeline via Cloudformation?

How to pass a list to a nested stack parameter in AWS CloudFormation?

Categories

Resources