ElasticSearch not joining nodes in AWS Cluster - amazon-web-services

I am having issues with making clusters on AWS using ElasticSearch:
Software:
ES: elasticsearch-1.4.1.zip
AWS-Cloud: elasticsearch-cloud-aws/2.4.1
And that is being run on AWS EC2 Micro instance (Ubuntu 64). Both Instances use same security group with everything open, no restrictions at all
I have created two instances in us-west Oregon (us-west-2b) and I am using this configuration file:
{
"cluster.name": "mycluster",
"http": {
"cors.enabled" : true,
"cors.allow-origin": "*"
},
"node.name": "LosAngeles-node",
"node.master": "false",
"cloud": {
"aws": {
"access_key": "xxxxxxxxxxxx",
"secret_key": "xxxxxxxxxxxxxxxxxxxx",
"region": "us-west"
}
},
"discovery": {
"type": "ec2",
"ec2" : {
"groups": "esallaccess"
},
"zen": {
"ping": {
"multicast": {
"enabled": "false"
}
}
}
}
}
The LosAngeles node should be a work horse for the cluster, thus node.master = false.
When I start this node it constantly pings and never stops pinging, this is in the log after I start it:
...
[2014-11-28 15:18:30,593][TRACE][discovery.ec2 ] [LosAngeles-node] building dynamic
unicast discovery nodes...
[2014-11-28 15:18:30,593][DEBUG][discovery.ec2 ] [LosAngeles-node] using dynamic
discovery nodes []
[2014-11-28 15:18:32,170][TRACE][discovery.ec2 ] [LosAngeles-node] building dynamic
unicast discovery nodes...
[2014-11-28 15:18:32,170][DEBUG][discovery.ec2 ] [LosAngeles-node] using dynamic
discovery nodes []
[2014-11-28 15:18:32,170][TRACE][discovery.ec2 ] [LosAngeles-node] full ping responses:
{none}
[2014-11-28 15:18:32,170][DEBUG][discovery.ec2 ] [LosAngeles-node] filtered ping
responses: (filter_client[true], filter_data[false]) {none}
[2014-11-28 15:18:32,170][TRACE][discovery.ec2 ] [LosAngeles-node] starting to ping
...
enter code here
I am thinking this is problem with region. Any help is appreciated.
PS
Master node (NewYork) has the same configuration file with different name and node.master = true

Try to add master node address into the new node configuration.
In elasticsearch.yml
Verify the following parameters:
cluster.name: your-cluster-name
node.master: false
node.data: false
discovery.zen.ping.timeout: 3s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["your-master.dns.domain.com"]
If you use multicast, disable it. It doesn't work in AWS EC2
For any case check your Security Group.

It is required to allow your instances to acquire information about each other to discover available clusters in order for your nodes to find the cluster to join.
The AWS-cloud plugin automatically handles the joining of a node into a cluster once a master is nominated.
setting a Discovery Permission as a policy and apply it to your IAM role should fix this. Here goes the policy I used:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "whatever",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeRegions",
"ec2:DescribeSecurityGroups",
"ec2:DescribeTags"
],
"Resource": [
"*"
]
}
]
}

Related

How to create Spot instances with request type of fleet using boto3 and AWS Lambda

I am trying to create spot instance with request type of 'fleet' which has .medium', 't2.large', 't3a.medium', 't3a.large', 't3.medium', 't3.large' instances using boto3.
My code runs but I am getting 6 different spot fleet with request type of 'instance' with 6 different instances of same kind.
Here's my code:
import boto3
def lambda_handler(event, context):
ec2 = boto3.client('ec2')
# Create a launch template
response = ec2.create_launch_template(
DryRun=False,
LaunchTemplateName='my-launch-template-1',
VersionDescription='Initial version',
LaunchTemplateData={
'ImageId': 'ami-02b20e5594a5e5398',
'InstanceType': 't2.medium',
'Placement': {
'AvailabilityZone': 'us-east-1a',
},
'EbsOptimized': True,
'Monitoring': {
'Enabled': True
},
'SecurityGroupIds': [
'sg-053a39faea8548b14',
]
}
)
# Get the launch template ID
launch_template_id = response['LaunchTemplate']['LaunchTemplateId']
# Set the desired capacity of the fleet to the number of instance types specified
desired_capacity = 6
# Set the target capacity of the fleet to the desired capacity
target_capacity = desired_capacity
# Create a launch template configuration for each instance type in the fleet
launch_template_configs = []
instance_types = ['t2.medium', 't2.large', 't3a.medium', 't3a.large', 't3.medium', 't3.large']
for instance_type in instance_types:
launch_template_config = {
'LaunchTemplateSpecification': {
'LaunchTemplateId': launch_template_id,
'Version': '$Latest'
},
'Overrides': [
{
'InstanceType': instance_type
}
]
}
launch_template_configs.append(launch_template_config)
# Create the fleet
response = ec2.create_fleet(
DryRun=False,
TargetCapacitySpecification={
'TotalTargetCapacity': target_capacity,
'OnDemandTargetCapacity': 0,
'SpotTargetCapacity': target_capacity,
'DefaultTargetCapacityType': 'spot'
},
LaunchTemplateConfigs=launch_template_configs
)
print(response)
This what my policy looks like:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"ec2:CreateFleet",
"ec2:CreateLaunchTemplate",
"iam:CreateServiceLinkedRole"
],
"Resource": "*"
},
{
"Action": [
"ec2:RunInstances"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "ec2:RequestSpotInstances",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "arn:aws:iam::*:role/aws-service-role/ec2.amazonaws.com/AWSServiceRoleForEC2Fleet",
"Condition": {
"StringEquals": {
"iam:AWSServiceName": "ec2.amazonaws.com"
}
}
}
]
}
What I'm getting:
instead of this im trying to make 1 spot fleet request that provides 't2.medium', 't2.large', 't3a.medium', 't3a.large', 't3.medium', 't3.large' instances.
What im trying to achieve :
What am I doing wrong here? How can I have only fleet request that has six different instances?
edit-1: Added an image showing what exactly im trying to achieve.
EC2 instances in an EC2 fleet are launched based on an allocation strategy. The way this works is that you specify launch templates for each instance type you would like to have in your fleet and AWS will determine what types will be launched based on the strategy chosen.
There are a few strategy types, such as price-capacity-optimized, lowest-price, diversified, etc. You can read about these on the AWS documentation linked above. The default strategy is lowest-price, which means that AWS will try to launch instances from the lowest price pool. This is probably why you get only t3a.medium type instances.
Having these strategies in place means that you cannot explicitly say that I want one instance from each type I specify in the launch_template_configs list. What you can do is to override the default strategy and try to use something like diversified, which will try to distribute instances across all Spot capacity pools.
You can override the strategy if in the create_fleet command:
response = ec2.create_fleet(
DryRun=False,
TargetCapacitySpecification={
'TotalTargetCapacity': target_capacity,
'OnDemandTargetCapacity': 0,
'SpotTargetCapacity': target_capacity,
'DefaultTargetCapacityType': 'spot'
},
LaunchTemplateConfigs=launch_template_configs,
SpotOptions={
'AllocationStrategy': 'diversified',
},
)

SSM Document not working LAMBDA -> SSM- > Event bridge , im stuck and out of ideas

So I've tried setting up a LAMBDA function using python 3.9, which calls my SSM Document which will restart "Coldfusion 2018 application server" within windows, if the cloudwatch alert name is triggered. I have it set in eventbridge to alarm state change which means everything the domain goes down "Coldfusion service stopped" it should run the SSM document and the powershell script. But nothing is working at all and ive tried practically everything i know of.
Below are my default roles for LAMBDA + Inline Policy, along with my LAMBDA function, my SSM document and my eventbridge.
My Lambda default role assigned is :
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "logs:CreateLogGroup",
"Resource": "arn:aws:logs:ap-southeast-2:727665054500:*"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:ap-southeast-2:727665054500:log-group:/aws/lambda/johntest:*"
]
}
]
}
And my Inline policy attached to my default role, to allow SSM is:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "ssm:SendCommand",
"Resource": "*"
}
]
}
Lambda function:
import os
import boto3
import json
ssm = boto3.client('ssm')
def lambda_handler(event, context):
InstanceId = 'i-06692c60000c89460'
ssmDocument = 'johntest'
log_group = os.environ['AWS_LAMBDA_LOG_GROUP_NAME']
targetInstances = [InstanceId]
response = ssm.send_command(
InstanceIds=targetInstances,
DocumentName=ssmDocument,
DocumentVersion='$DEFAULT',
CloudWatchOutputConfig={
'CloudWatchLogGroupName': log_group,
'CloudWatchOutputEnabled': True
}
)
EventBridge (CloudWatch events) which is the trigger for lambda
{
"source": [
"aws.cloudwatch"
],
"detail-type": [
"CloudWatch Alarm State Change"
],
"detail": {
"alarmName": [
"TASS-john2-Testing-SiteDown for domain https://johntest.tassdev.cloud/tassweb"
],
"state": {
"value": [
"ALARM"
]
},
"previousState": {
"value": [
"OK"
]
}
}
}
SSM document to run the PowerShell script In SSM when you create an initial document, I selected the command/session document which may be wrong? Do I need to make an automation document? If so can someone show me the correct code/syntax please?
---
schemaVersion: "2.2"
description: "Example document"
parameters:
Message:
type: "String"
description: "Example parameter"
default: "Hello World"
mainSteps:
- action: "aws:runPowerShellScript"
name: "example"
inputs:
timeoutSeconds: '600'
runCommand:
- Restart-Service -DisplayName "ColdFusion 2018 Application Server"
I tried setting up the lambda function, with my instance ID/SSM document name, and the trigger is event bridge, which is set to my CloudWatch alarm based on state change. I cannot get the SSM document to trigger my window service "coldfusion" at all.
I have pasted my above code for the eventbridge/ssm document/lambda and even my default lambda role/inline policy which still doesn't seem to work. I also have the SSM agent installed on my instance but still nothing.
EDIT: Just went to Systems Manager, and clicked run command and it ran the powershell script and started up the coldfusion service, but why isn't it triggering from the CloudWatch alarm?
Cheers.

AWS Integration on Kubernetes

I'm having problems setting up AWS Integration on a Kubernetes Cluster. I've already set the kubernetes.io/cluster/clustername = owned tag on all Instances, Subnets, VPC, and in a Single SG. I've also passed the --cloud-provider=aws flag to both API Server and Controller Manager, but the Controller Manager does not start.
Controller Manager Logs:
I0411 21:03:48.360194 1 aws.go:1026] Building AWS cloudprovider
I0411 21:03:48.360237 1 aws.go:988] Zone not specified in configuration file; querying AWS metadata service
F0411 21:03:48.363067 1 controllermanager.go:159] error building controller context: cloud provider could not be initialized: could not init cloud provider "aws": error finding instance i-0442e20b4a28b2274: "error listing AWS instances: \"NoCredentialProviders: no valid providers in chain. Deprecated.\\n\\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors\""
The Policy Attached to the Master Nodes is:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [ "ec2:*" ],
"Resource": [ "*" ]
},
{
"Effect": "Allow",
"Action": [ "elasticloadbalancing:*" ],
"Resource": [ "*" ]
},
{
"Effect": "Allow",
"Action": [ "route53:*" ],
"Resource": [ "*" ]
}
]
}
Querying the AWS Metadata Service from a master via cURL returns proper credentials
Any help will be much appreciated!
P.S: I'm not using Kops or anything of that kind. I've set up the control components plane by myself.
I was able to fix this by passing the --cloud-provider=aws flag to the kubelets. I thought that wasn't needed on Master nodes.
Thanks!

AWS Service Unable To Assume Role

I've two AWS Cloudformation stacks, one for IAM roles and the second to create an AWS service and import the respective roles into it using Cloudformation.
When 10+ services are deployed the following error appears randomly on 1 or 2 of the services -
AWS::ECS::Service service Unable to assume role and validate the
listeners configured on your load balancer. Please verify that the ECS
service role being passed has the proper permissions.
If all the services are torn down and the services redployed to the ECS cluster, the error appears but for different services.
The AWS fix for this can be seen here
If the 1 or 2 broken services are torn down and redeployed the services deploy without issue. So the problem appears to only occur when many services are deployed at the same time - this indicates it may be an IAM propagation timing issue within Cloudformation.
I've tried adding depends on in the service definition -
"service" : {
"Type" : "AWS::ECS::Service",
"DependsOn" : [
"taskdefinition",
"ECSServiceRole"
],
"Properties" : {
"Cluster" : { "Ref": "ECSCluster"},
"Role" : {"Ref" : "ECSServiceRole"},
etc...
}
}
But this doesn't work.
As you can note, I've also removed the IAM import value for the ECSServiceRole and replaced it with an inline resource policy seen here -
"ECSServiceRole" : {
"Type" : "AWS::IAM::Role",
"Properties" : {
"AssumeRolePolicyDocument" : {
"Statement" : [
{
"Sid": "",
"Effect" : "Allow",
"Principal" : {
"Service" : [
"ecs.amazonaws.com"
]
},
"Action" : [
"sts:AssumeRole"
]
}
]
},
"Path" : "/",
"Policies" : [
{
"PolicyName" : "ecs-service",
"PolicyDocument" : {
"Statement" : [
{
"Effect" : "Allow",
"Action" : [
"ec2:Describe*",
"ec2:AuthorizeSecurityGroupIngress",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:DeregisterTargets",
"elasticloadbalancing:Describe*",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:RegisterTargets",
"sns:*"
],
"Resource" : "*"
}
]
}
}
]
}
}
But again - the inline policy doesn't fix the issue either.
Any ideas or pointers would be much appreciated!
In reply to answer 1.
Thank you - I wasn't aware of this improvment.
Is this the correct way to associate the service linked role for ECS?
"ECSServiceRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"AssumeRolePolicyDocument": {
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": [
"ecs.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
},
"Path": "/",
"Policies": [
{
"PolicyName": "CreateServiceLinkedRoleForECS",
"PolicyDocument": {
"Statement": [
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole",
"iam:PutRolePolicy",
"iam:UpdateRoleDescription",
"iam:DeleteServiceLinkedRole",
"iam:GetServiceLinkedRoleDeletionStatus"
],
"Resource": "arn:aws:iam::*:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS*",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "ecs.amazonaws.com"
}
}
}
]
}
}
]
}
}
Final Answer
After months of intermittent on-going issues with AWS regarding this matter AWS came back to say they were throttling us in the background, on the ELB. This is why the random and varied issues were appearing when deploying 3+ docker services via Cloudformation at the same time. The solution was nothing to do with IAM permissions, rather it was to increase the rate limit on the ELB via the "AWS Service Team".
So the fix was to continue using the two stack approach in Cloudformation, one with the IAM roles, which in turn were imported into the service layer stack. The fix was to add a depends on in the service definition for all of the other stack resources in the service layer script. By doing this it allows sufficient time for the IAM roles to be imported and executed by the service, thus this was a Cloudformation resource creation timing issue.
"service" : {
"Type" : "AWS::ECS::Service",
"DependsOn" : [
"TaskDefinition",
"EcsElasticLoadBalancer",
"DnsRecord"
],
"Properties" : {
etc...
}
}
UPDATE:
As of July 19th 2018, it is now possible to create a IAM Service-Linked Roles using CloudFormation https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-iam-servicelinkedrole.html.
EcsServiceLinkedRole:
Type: "AWS::IAM::ServiceLinkedRole"
Properties:
AWSServiceName: "ecs.amazonaws.com"
Description: "Role to enable Amazon ECS to manage your cluster."
OLD ANSWER:
Creating your own ECSServiceRole is no longer required. By not specifying a role for your service, AWS will default on using the ECS Service-Linked role. If your AWS account is recent enough, or you have already created a cluster via the console you don't have to do anything for this to work. If not, run the following command to create the role: aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com.

Minimal IAM policy for ec2:RunInstances

I'm trying to narrow down the minimal policy to run a predefined machine image. The image is based on two snapshots and I only want "m1.medium" instance types to be launched.
Based on that and with the help of this page and this article, I worked out the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1385026304010",
"Effect": "Allow",
"Action": [
"ec2:RunInstances"
],
"Condition": {
"StringEquals": {
"ec2:InstanceType": "m1.medium"
}
},
"Resource": [
"arn:aws:ec2:us-east-1::instance/*",
"arn:aws:ec2:us-east-1::image/ami-f1c3e498",
"arn:aws:ec2:us-east-1::snapshot/snap-e2f51ffa",
"arn:aws:ec2:us-east-1::snapshot/snap-18ca2000",
"arn:aws:ec2:us-east-1::key-pair/shenton",
"arn:aws:ec2:us-east-1::security-group/sg-6af56d02",
"arn:aws:ec2:us-east-1::volume/*"
]
}
]
}
The policy narrows down the exact image, snapshots, security group and key-pair while leaving the specific instance and volume open.
I'm using the CLI tools as follows, as described here:
aws ec2 run-instances --dry-run \
--image-id ami-f1c3e498 \
--key-name shenton \
--security-group-ids sg-6af56d02 \
--instance-type m1.medium
The ~/.aws/config is as follows:
[default]
output = json
region = us-east-1
aws_access_key_id = ...
aws_secret_access_key = ...
The command results in a generic You are not authorized to perform this operation message and the encoded authorization failure message indicates that none of my statements were matched and therefore it rejects the action.
Changing to "Resource": "*" resolves the issue obviously, but I want to gain more understanding as to why the above doesn't work. I fully realize that this involves some degree of guess work, so I welcome any ideas.
I've been contacted by Jeff Barr from Amazon Web Services and he kindly helped me find out what the issue was.
First you need to decode the authorization failure message using the following statement:
$ aws sts decode-authorization-message --encoded-message 6gO3mM3p....IkgLj8ekf
Make sure the IAM user / role has permission for the sts:DecodeAuthorizationMessage action.
The response contains a DecodedMessage key comprising another JSON encoded body:
{
"allowed": false,
"explicitDeny": false,
"matchedStatements": {
"items": []
},
"failures": {
"items": []
},
"context": {
"principal": {
"id": "accesskey",
"name": "testuser",
"arn": "arn:aws:iam::account:user/testuser"
},
"action": "ec2:RunInstances",
"resource": "arn:aws:ec2:us-east-1:account:instance/*",
"conditions": { ... }
}
}
Under context => resource it will show what resource it was attempting to match against the policy; as you can see, it expects an account number. The arn documentation should therefore be read as:
Unless otherwise specified, the region and account are required.
Adding the account number or * in the affected ARN's fixed the problem:
"Resource": [
"arn:aws:ec2:us-east-1:*:instance/*",
"arn:aws:ec2:us-east-1:*:image/ami-f1c3e498",
"arn:aws:ec2:us-east-1:*:snapshot/snap-e2f51ffa",
"arn:aws:ec2:us-east-1:*:snapshot/snap-18ca2000",
"arn:aws:ec2:us-east-1:*:key-pair/shenton",
"arn:aws:ec2:us-east-1:*:security-group/sg-6af56d02",
"arn:aws:ec2:us-east-1:*:volume/*"
]