Stop HDFS Balancer after using nohup - hdfs

I've launched my HDFS balancer with the nohup command :
$ nohup hdfs balancer &
It is taking forever and I need to work on my cluster. Do you know how can I stop the process ?
It's a distributed process so it's quite difficult to stop by juste doing "kill PID" ...
Thank you

Actually, the HDFS balancer is not a distributed process. It is a single process that initiates block movements on the cluster; the balancing process is carried out in a distributed manner cooperatively by DataNodes, but the commands about which blocks to balance originate from a single process (the one created by hdfs balancer). Doing a kill $PID will be sufficient to stop any further balancing.
Source: HDFS Balancer documentation, personal experience.

Related

Does AWS Fargate docker image with express app listening and waiting for requests consume cpu?

I configured an AWS Fargate cluster with a docker image that runs nodejs express app and listens on port 80. Now I can browse to the public IP and successfully the request is handled by AWS Fargate.
Is it right that the docker container now is running and still waiting for requests?
Isn't it consuming CPU and so I have to pay as long as the docker container is running?
Do I have to build a docker image that just handles a single request and exits to be really serverless?
Thank you
Is it right that the docker container now is running and still waiting for requests? Isn't it consuming CPU and so I have to pay as long as the docker container is running?
Yes, that's how ECS Fargate works. It's really no different from running a docker container on your local computer. It has to be up and running all the time in order to handle requests that come in.
Do I have to build a docker image that just handles a single request and exits to be really serverless?
The term "serverless" is a vague marketing term and means different things depending on who you ask. Amazon calls ECS Fargate serverless because you don't have to manage, or even know the details of, the server that is running the container. In contrast to ECS EC2 deployments, where you have to have EC2 servers up and running ahead of time and ECS just starts the containers on those EC2 servers.
If you want something that only runs, and only charges you, when a request comes in, then you would need to reconfigure your application to run on AWS Lambda instead of ECS Fargate.

How do I run Locust Load Distributed testing on AWS EC2 without running multiple sessions?

I'm trying to run a Locust test via EC2 but am running into high CPU usage problems. I would like to distribute the load via master-slave processes but is there a way to do it without creating multiple EC2's and logging into each one and running commands? The closest thing I found was:
https://aws.amazon.com/blogs/devops/using-locust-on-aws-elastic-beanstalk-for-distributed-load-generation-and-testing/
Here they use Elastic Beanstalk but some of the info seems quite dated.

Hadoop3 balancer vs disk balancer

I read Hadoop ver 3 document about disk balancer and it said
"Diskbalancer is a command line tool that distributes data evenly on all disks of a datanode.
This tool is different from Balancer which takes care of cluster-wide data balancing."
I really dont know whats difference between 'balancer' and 'disk balancer' yet.
Could you explain what is it?
Thank you!
Balancer deals with internodes data balancing present in multiple datanodes present in the cluster whereas disk balancer deals with data present disks of a single datanode.

scaling flask app on AWS ECS with nginx and uwsgi

I am trying to scale a flask micro service in AWS ECS to handle production workloads. My application is using the flask-apschedueler to handle long running tasks. I am using uwsgi web server for deployment in ECS so I am packing the application inside the container along with uwsgi server. The nginx container is running separately on ECS cluster.
My uwsgi config uses a single process, single thread right now.
I have successfully deployed it on AWS ECS but wondering what to scale for handling production workloads. I am debating between these options
1) I can spin up multiple containers and nginx would round robin to all of them distributing requests equally through route 53 dns service
2) I can increase the number of processes in uwsgi config but that messes with my flask-apscheduler as I only need one instance of it running. The workarounds I found are not that neat
It would be great if someone can share how to go about this
The docker mentality is more along the lines of 'one process per task'. Anytime you have more than one task running on a container, you should rethink.
I would advise the first approach. Create a service to wrap your task in ECS and simply vary the 'Desired' number of tasks for that service to scale the service up and down as desired.
If you only need the scheduler running on one of the tasks, you should setup a separate service using the same image, but with an environment variable to tell your container to start the scheduler. Make it true on the scheduler service/task and false on the worker service/tasks. Those ENV variables can be set on the container definition inside your ECS task definition.
This would be the "docker way".

How to increase number of sockets on AWS EC2 instance?

I'm trying to run the same batch process(Linux) on Equinix VM and and EC2 instance. The configuration of the machines are same, yet the EC2 process is running 10x slower than the equinix one.
Found the difference in sockets. 8 sockets in Equinix, while only 1 in EC2 (from lscpu)
Note: the batch process utilizes WebURL to fetch features.
I figured out maybe increasing the sockets while launching the instance and then trying to run, shall match up the pace of equinix VM.
Any leads on how to go about doing it? Or any other hints to improve the performance of the EC2 instance?