Node is not able to connect to Hub, keep sending registration event - amazon-web-services

Objective: UI test execution takes quite a time and we have a lot of UI test cases, currently we have a grid setup on AWS EC2 but scaling and descaling of resources manualy is time-consuming, so we decided to explore AWS ECS Fargate where we can scale based on CPU and Memory utilization.
Motivation blog: https://aws.amazon.com/blogs/opensource/run-selenium-tests-at-scale-using-aws-fargate/
Problem Statement: Node is initiating registration requests but it is not able to register itself to the hub.
Findings till now: I found a repo on git which is doing what we are trying to achieve except for one thing, that is in version 3.141.59 and we want the version 4.4.0-20220831
What I can achieve: So using this repo I changed the version of Hub and Node to 4.4.0-20220831 and also changed environment variables according to the specific version requirements, on the execution of cloudFormation template Hub is up and running but there was no node connected when I checked the log of hub and node, I found hub service was configured and running as well as the node service was sending registration requests for N times.
This is my first question here so I am not able to show images in question itself, sorry for inconveniance.
HUB Screenshots
Hub environment
Hub service discovery
Hub logs
Node Screenshots
Node environment
Node service discovery
Node logs
Before changing anything everyting is working as expected on V3 but we need V4.
Thank you for gving your valuable time, looking forward for you response.
Thank you once again.

The problem is not with any of these resources, when I allowed ports 4442 and 4443 in my security group it worked.
Thank you everyone for your time and support.

Related

AWS - Log aggregation and visualization

We have couple of application running on AWS. Currently we are redirecting all our logs to single bucket. However for ease of access to users, I am thinking to install ELK Stack on EC2 instance.
Would want to check if there is alternate way available where I don't have to maintain this stack.
Scaling won't be an issue, as this is only for logs generated through application running on AWS, so not ingestion or processing is required. mostly log4j logs.
You can go for either the managed Elasticsearch available in AWS or setup your own in an EC2 instance
It usually comes down to the price involved and the amount of time you have in hand in setting up and maintaining your own setup
With your own setup, you can do a lot more configurations than that provided by the managed service and also helps in reducing the cost
You can find more info on this blog

AWS Fargate 503 Service Temporarily Unavailable

I'm trying to deploy backend application to the AWS Fargate using cloudformation templates that I found. When I was using the docker image training/webapp I was able to successfully deploy it and access with the externalUrl from the networking stack for the app.
When I try to deploy our backend image I can see the stacks are deploying correctly but when I try to go to the externalUrl I get 503 Service Temporarily Unavailable and I'm unable to see it... Another thing that I've noticed is on the docker hub I can see that the image is continuously pulled all the time when the cloudformation services are running...
The backend is some kind of maven project I don't know exactly what but I know that locally its working but to get it up running the container with this backend image takes about 8 minutes... I'm not sure if this affects the Fargate ?? Any Idea how to get it working ?
It sounds like you need to find the actual error that you're experiencing, the 503 isn't enough information. Can you provide some other context?
I'm not familiar with fargate but have been using ecs quite a bit this year and I generally would find that by going to (on the dashboard) ecs -> cluster -> service -> events. The events tab gives more specific errors as to what is happening.
My ecs deployment problems are generally summarized into
the container is not exposing the same port as is in the definition, this could be the case if you're deploying from a stack written by someone else.
the task definition memory/cpu restrictions don't grant enough space for the application and it has trouble placing (probably a problem with ecs more than fargate but you never know.)
Your timeout in the task definition is not set to 8 minutes: see this question, it has a lot of this covered
Your start command in the task definition does not work as expected with the container you're trying to deploy
If it is pulling from docker hub continuously my bet would be that it's 1, 3 or 4, and it's attempting to pull the image over and over again.
Try adding a Health check grace period of 60 by going to ECS -> cluster -> service -> update Network Access section.

ECS Service restart after deploy new version of docker image

Hu guys,
I have ec2 cluster with service and instance. Task is based on latest version of docker file which is allocated in ecr. Now I'm looking for simplest way to finish my pipeline with auto "refresh" service when latest image has been deployed. I can't find any feature from aws to resolve this problem, but I found this: https://github.com/fdfk/ecsServiceRestart but unfortunately it doesn't work (can't communicate with my service). But this case inspired me very much because according to author's approach this solution make a duplicate service before update so it provide something like HA without any downtime. Guys is it possible to go throughout these steps without any downtime at all?
deploy new version of image,
service detect new version of image,
auto refresh with implementation new version
Finally I found the best way to achieve my goal. So it was very easy - I just have used ecs-deploy https://github.com/fabfuel/ecs-deploy which I have adopted to my pipeline. I set up longer timeout with no warning flag and this script do for me everything what exactly need. In my example I have one cluster with 3 instances and 1 service witch two running tasks (two the same nodes behind load balancer). When I update my docker image in ECR, ecs-deploy runs auto update first instance, and according to blue-green deployment it updates next instances one by one with load balancer links too. So in this way I achieved full automated deployment after accepting merge request (of course I skipped few steps in this describe). I hope that this will be helpful for somebody. Cheers!

How to deploy to autoscaling group with only one active node without downtime

There are two questions about AWS autoscaling + deployment which I cannot clearly answer:
I'm currently trying to figure out, whats the best strategy to deploy to an EC2 instance behind an ELB which is the only member of an autoscaling group without downtime.
By now the EC2 setup will be done with puppet including the deployment of the application, triggered after an successful build by jenkins.
The best solution I have found is to check per script how many instances are registered at the ELB. If a single one is registered, spawn a new one, which runs puppet on startup (the new node will be up to date) and kill the old node.
How to deploy (autoscaling EC2 behind an ELB) without delivering two different versions of the application?
Possible solution: Check per script how many EC2 instances are registered to the ELB, spawn the same amount of instances, register all new instances and unregister all old ones.
My experiences with AWS teacher me that AWS has a service for everything. So are there any services out there to accomplish my requirements and my solutions are inconvenient?
You can create an entirely new environment with its own ELB and when it's ready and checked, you switch the DNS record to the new ELB.
Anyway for a brief time (60 seconds or so, depending on the TTL of your DNS record) some users will see your old version while some others will see the new version.
In the end there were two possible solutions. Both of them would temporarily deliver two versions of the app.
Use AWS CodeDeploy to perform an sequential deployment (one after another). This solution offers the possibility to rollback to a previous state and visual shows the state and results of the deployment.
Create a python script to get the registered nodes (using Boto) and run the appropriate puppet script on them (using Fabric). This solution offers more control of the deployment but requires some time to build these script. Also there can be bugs..
For now I choose AWS CodeDeploy because its already available and - hopefully - well tested.

Mesos, Marathon, the cloud and 10 data centers - How to talk to each other?

I've been looking into Mesos, Marathon and Chronos combo to host a large number of websites. In my head I should be able to type a few commands into my laptop, and wait about 30 minutes for the thing to build and deploy.
My only issue, is that my resources are scattered across multiple data centers, numerous cloud accounts, and about 6 on premises places. I see no reason why I can't control them all from my laptop -- (I have serious power and control issues when it comes to my hardware!)
I'm thinking that my best approach is to build the brains in the cloud, (zoo keeper and at least one master), and then add on the separate data centers, but I am yet to see any examples of a distributed cluster, where not all the nodes can talk to each other.
Can anyone recommend a way of doing this?
I've got a setup like this, that i'd like to recommend:
Source code, deployment scripts and dockerfiles in GIT
Each webservice has its own directory and comes together with a dockerfile to containerize it
A build script (shell script running docker builds) builds all the docker containers, of which all images are pushed to a docker image repository
A ansible deploy deploys all the containers remotely to a set of VPSes. (You use your own deployment procedure, that fits mesos/marathon)
As part of the process, a activeMQ broker is deployed to the cloud (yep, in a container). While deploying, it supplies each node with the URL of the broker they need to connect to. In your setup you could instead use ZooKeeper or etcd for example.
I am also using jenkins to do automatic rebuilds and to run deploys whenever there has been GIT commits, but they can also be done manually.
Rebuilds are lightning fast, and deploys dont take much time either. I can replicate everything I have in my repository endlessly and have zero configuration.
To be able to do a new deploy, all I need is a set of VPSs with docker daemons, and some datastores for persistence. Im not sure if this is something that you can replace with mesos, but ansible will definitely be able to install a mesos cloud for you onto your hardware.
All logging is being done with logstash, to a central logging server.
i have setup a 3 master, 5 slave, 1 gateway mesos/marathon/docker setup and documented here
https://github.com/debianmaster/Notes/wiki/Mesos-marathon-Docker-cluster-setup-on-RHEL-7-with-three-master
this may help you in understanding the load balancing / scaling across different machines in your data center
1) masters can also be used as slaves
2) mesos haproxy bridge script can be used for service discovery of the newly created services in the cluster
3) gateway haproxy is updated every min with new services that are created
This documentation has
1) master/slave setup
2) setting up haproxy that automatically reloads
3) setting up dockers
4) example service program
You should use Terraform to orchestrate your infrastructure as code.
Terraform has a lot of providers that allows you to manage different resources accross multiples clouds services and/or bare-metal resources such as vSphere.
You can start with the Getting Started Guide.