Autoclustering does not work on AWS with RabbitMQ - amazon-web-services

We are using the latest version of RabbitMQ, v3.7.2 on a few EC2 instances on AWS. We want to use auto clustering which comes default in the product, Cluster Formation and Peer Discovery.
After we start RabbitMQ it fails/ignores to do this. The only message we see in the log file is:
[info] <0.229.0> Peer discovery backend rabbit_peer_discovery_aws does not support registration, skipping registration.
On our RabbitMQ EC2 instance an IAM role is attached with the coorect policy. The rabbitMQ config is:
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws
cluster_formation.aws.region = eu-west-1
cluster_formation.aws.use_autoscaling_group = true
cluster_formation.aws.use_private_ip = true
Did anyone face this issue?

Add the following to your rabbitmq.conf and restart rabbitmq-server
log.file.level = debug
It allows you to see a discovery request to AWS in logs.
Then do this on any rabbitmq node:
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
It'll execute the discovery again. Check rabbitmq logs for 'AWS Request' you'll see corresponding response so that you can check if your ec2 instances were found by specified tags. If no, something is wrong with your tags.

Not an answer (not enough reputation points to comment) but I'm dealing with the same thing. I've double-checked that the security groups are correct, they allow ports 4369, 5672 and 15672 (confirmed via telnet/netcat), and the IAM policies are correct. Debug logging shows nothing else. I'm at a loss how to figure this one out.

Related

Error loading Namespaces. Unauthorized: Verify you have access to the Kubernetes cluster

I have created a EKS cluster using the the command line eksctl and verified that the application is working fine.
But noticing a strange issue, when i try yo access the nodes in the cluster in the web browser i see the following error
Error loading Namespaces
Unauthorized: Verify you have access to the Kubernetes cluster
I am able to see the nodes using kubectl get nodes
I am logged in as the admin user. Any help on how to workaround this would be really great. Thanks.
You will need to add your IAM role/user to your cluster's aws-auth config map
Basic steps to follow taken from https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
kubectl edit -n kube-system configmap/aws-auth
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
mapRoles: |
- rolearn: <arn:aws:iam::111122223333:role/eksctl-my-cluster-nodegroup-standard-wo-NodeInstanceRole-1WP3NUE3O6UCF>
username: <system:node:{{EC2PrivateDNSName}}>
groups:
- <system:bootstrappers>
- <system:nodes>
mapUsers: |
- userarn: <arn:aws:iam::111122223333:user/admin>
username: <admin>
groups:
- <system:masters>
- userarn: <arn:aws:iam::111122223333:user/ops-user>
username: <ops-user>
groups:
- <system:masters>
Also seeing this error and it got introduced by the latest addition to EKS, see https://aws.amazon.com/blogs/containers/introducing-the-new-amazon-eks-console/
Since then, the console makes requests to EKS in behalf of the user or role you are logged in.
So make sure the kube-system:aws-auth configmap has that user or role added.
This user/role might not be the same you are using locally with AWS CLI, hence kubectl might work while you still see that error !
Amazon added recently (2020.12) new feature that allows you to browse workloads inside cluster from Aws Console.
If you miss permissions you will get that error.
What permissions are needed is described here
https://docs.aws.amazon.com/eks/latest/userguide/security_iam_id-based-policy-examples.html#policy_example3
This might as well be because you created the AWS EKS cluster using a different IAM user than the one currently logged into the AWS Management Console hence the IAM user currently logged into the AWS Management Console does not have permissions to view the namespaces on the AWS EKS cluster.
Try logging in to the AWS Management Console using the IAM user credentials of the user who created the AWS EKS cluster, the issue should be fixed.

aws data migration service . start replication task issue

I have created AWS DMS replication instance, replication task and source, target enpoints using terrform.
Now, when i run start replication task from windows aws cli . it throws this SSL error.
Error running command 'aws dms start-replication-task --start-replication-task-type start-replication --replication-task-arn arn:aws:dms:us-west-2:accountnumber:task:xxxxxxx': exit status 254. Output: C:\Program Files\Amazon\AWSCLIV2\urllib3\connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'dms.us-east-1.amazonaws.com'. Adding certificate verification is strongly advised.
My CLI version is The version is aws-cli/2.1.6 Python/3.7.9 Windows/10 exe/AMD64 prompt/off.
There is no proxy configured
Any suggestion on this issue.
Thnaks
Make sure you set your credential correctly where you replication task is living by set [default] on the right one
./aws/credentials
to start the replication task you have to fill two mandatory field
{
"ReplicationTaskArn": "string",
"StartReplicationTaskType": "string"
}
StartReplicationTaskType --> Valid Values: start-replication | resume-processing | reload-target
start-replication only when you create the task so you may need to make it reload-target
Ref:
https://docs.aws.amazon.com/dms/latest/APIReference/API_StartReplicationTask.html

Amazon ECS Service configuration return exactly 1 result, but got > '0'

I am trying to update an ECS service with bamboo and get the following error:
Failed to fetch resource from AWS!
java.lang.RuntimeException: Expected DescribeServiceRequest for
service 'my-service' to return exactly 1 result, but got
'0' at
net.utoolity.atlassian.bamboo.taws.aws.ECS.getSingleService(ECS.java:674)
at
net.utoolity.atlassian.bamboo.taws.ECSServiceTask.executeUpdate(ECSServiceTask.java:311)
at
net.utoolity.atlassian.bamboo.taws.ECSServiceTask.execute(ECSServiceTask.java:133)
at
net.utoolity.atlassian.bamboo.taws.AWSTask.execute(AWSTask.java:164)
at
com.atlassian.bamboo.task.TaskExecutorImpl.lambda$executeTasks$3(TaskExecutorImpl.java:319)
at
com.atlassian.bamboo.task.TaskExecutorImpl.executeTaskWithPrePostActions(TaskExecutorImpl.java:252)
at
com.atlassian.bamboo.task.TaskExecutorImpl.executeTasks(TaskExecutorImpl.java:319)
at
com.atlassian.bamboo.task.TaskExecutorImpl.execute(TaskExecutorImpl.java:112)
at
com.atlassian.bamboo.build.pipeline.tasks.ExecuteBuildTask.call(ExecuteBuildTask.java:73)
at
com.atlassian.bamboo.v2.build.agent.DefaultBuildAgent.executeBuildPhase(DefaultBuildAgent.java:203)
at
com.atlassian.bamboo.v2.build.agent.DefaultBuildAgent.build(DefaultBuildAgent.java:175)
at
com.atlassian.bamboo.v2.build.agent.BuildAgentControllerImpl.lambda$waitAndPerformBuild$0(BuildAgentControllerImpl.java:129)
at
com.atlassian.bamboo.variable.CustomVariableContextImpl.withVariableSubstitutor(CustomVariableContextImpl.java:185)
at
com.atlassian.bamboo.v2.build.agent.BuildAgentControllerImpl.waitAndPerformBuild(BuildAgentControllerImpl.java:123)
at
com.atlassian.bamboo.v2.build.agent.DefaultBuildAgent$1.run(DefaultBuildAgent.java:126)
at
com.atlassian.bamboo.utils.BambooRunnables$1.run(BambooRunnables.java:48)
at
com.atlassian.bamboo.security.ImpersonationHelper.runWith(ImpersonationHelper.java:26)
at
com.atlassian.bamboo.security.ImpersonationHelper.runWithSystemAuthority(ImpersonationHelper.java:17)
at
com.atlassian.bamboo.security.ImpersonationHelper$1.run(ImpersonationHelper.java:41)
at java.lang.Thread.run(Thread.java:745)
I am using the Force new deployment setting.
Any ideas what is the issue?
We have not been able to identify an bug in our code base right away, here's what's seemingly happening:
In order to append progress messages to the Bamboo build log, we need to call the DescribeServices API action before the call to the actual UpdateService API action, and the exception is thrown if and only if the targeted service cannot be found.
So at first glance there may be a subtle configuration issue, which happens to me every now and then when using Bamboo variables to reference resources from a preceding task, where it is easy to accidentally copy and paste the wrong variable name for example.
An incorrect reference in any of the following parameters of the Amazon ECS Service task's Update Service action would yield the resp. task action to fail with the error message at hand, because the DescribeServices API call in itself would succeed, yet fail to identify the target service:
Connector
Region
Service Name
For example, I've just reproduced the problem by using a non existing service name:
24-Oct-2019 17:37:05 Starting task 'Update sample ECS service (w/ ELB) - 2 instances' of type 'net.utoolity.atlassian.bamboo.tasks-for-aws:aws.ecs.service'
24-Oct-2019 17:37:05 Setting maxErrorRetry=7 and awaitTransitionInterval=15000
24-Oct-2019 17:37:05 Using session credentials provided by Identity Federation for AWS app (connector variable: 6f6fc85d-4ea5-43ce-8e70-25aba33a5fda).
24-Oct-2019 17:37:05 Selecting region eu-west-1
24-Oct-2019 17:37:05 Updating service 'NOT-A-SERVICE' on cluster 'TAWS-IT270-100-ubot':
24-Oct-2019 17:37:06 Failed to fetch resource from AWS!
24-Oct-2019 17:37:06 java.lang.RuntimeException: Expected DescribeServiceRequest for service 'NOT-A-SERVICE' to return exactly 1 result, but got '0'
...
Granted, the error message is not exactly helpful here, and we need to think about how to better handle this log pattern across our various tasks - the actual UpdateServiceAPI action would yield the much more appropriate ServiceNotFoundException exception in this scenario.
So assuming 'my-service' has been up and running before calling the 'Update Service' task action, can you please check whether the log from your failing Bamboo build may indicate this particular problem, for example by targeting another region by chance?
I could solve the issue by using a Shell Script Task and wrote a aws-cli command after exporting the keys. This workaround solved the issue:
aws ecs update-service --cluster my-cluster --service my-service --task-definition my-task-definition
So the AWS ECS is working fine and it should be a bug or misconfiguration in the Bamboo module.
But as mentioned in the other answer, the best approach would be to check if the configuration is correct.

AWS SSM describe-instance-information doesn't find my instances

I am using boto3 to control my EC2 instances on AWS from a python environment, using ec2 and ssm services. I have created an IAM account, that has access to AmazonSSMFullAccess and AmazonEC2FullAccess policies.
ec2 = boto3.client(
'ec2',
region_name='eu-west-1',
aws_access_key_id='…',
aws_secret_access_key='…/…+…'
)
ssm = boto3.client(
'ssm',
region_name='eu-west-1',
aws_access_key_id='…',
aws_secret_access_key='…/…+…'
)
I ran:
ec2.describe_instances()['Reservations']
Witch returned a list of all my instances.
But when I run:
ssm.describe_instance_information()
I get an empty list, though I have at least one instance running on AWS Linux AMI (ami-ca0135b3), and six others on recent Ubuntu AMIs. They are all in eu-west-1 (Ireland).
They should have SSM Agent preinstalled : (https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-install-ssm-agent.html)
I sshed into the AWS Linux instance, and tried to get the logs for ssm using:
sudo tail -f /var/log/amazon/ssm/amazon-ssm-agent.log
But nothing happens there when I run my python code. A sequence of messages gets displayed from time to time :
HealthCheck reporting agent health.
error when calling AWS APIs. error details - NoCredentialProviders: no valid providers in chain. Deprecated.
I also tried running a command through the web interface, selected ' AWS-RunRemoteScript' but no instance is shown below.
My goal is to run:
ssm.send_command(
DocumentName="AWS-RunShellScript",
Parameters={'commands': [command]},
InstanceIds=[instance_id],
)
But it gives me the following error, probably due to the previous problem.
botocore.errorfactory.InvalidInstanceId: An error occurred (InvalidInstanceId) when calling the SendCommand operation
The agent is pre-installed, but the instance (not just your IAM user) still needs the proper role to communicate with the systems manager. Particularly this step of Configuring Access to Systems Manager.
By default, Systems Manager doesn't have permission to perform actions
on your instances. You must grant access by using an IAM instance
profile. An instance profile is a container that passes IAM role
information to an Amazon EC2 instance at launch.
You should review the whole configuration guide and make sure you have configured all required roles appropriately.

Spark Cluster on EC2 - "ssh-ready" state pops up for password

I am trying to create a Spark cluster on EC2 with the following command
(I am referring Apache documetnation)
./spark-ec2 --key-pair=spark-cluster --identity-file=/Users/abc/spark-cluster.pem --slaves=3 --region=us-west-1 --zone=us-west-1c --vpc-id=vpc-2e44594 --subnet-id=subnet-18447841 --spark-version=1.6.1 launch spark-cluster
Once I fire above command master and slaves are getting created but once process reaches to 'SSH-ready' state process keeps on waiting for password
below is the Trace. I have referred apache official documentation and many other documents/videos none of the creations asked for the password. not sure whether I am missing something, any pointer to this issue is much appreciated.
Creating security group spark-cluster-master Creating security group
spark-cluster-slaves Searching for existing cluster spark-cluster in
region us-west-1... Spark AMI: ami-1a250d3e Launching instances...
Launched 3 slaves in us-west-1c, regid = r-32249df4 Launched master in
us-west-1c, regid = r-5r426bar Waiting for AWS to propagate instance
metadata...
**
Waiting for cluster to enter 'ssh-ready' state..........Password:
**
Modified the spark-ec2.py script to include the proxy and enabled the AWS Nat to allow the outbound calls