I am looking for a way to kill all YARN applications coming from a specific user.
I know that I can use yarn application -kill [application_ID] command but I have a list of jobs coming from the same user I'd like to kill (all of them).
More precisely, I would like for instance to kill all jobs coming from dr.who.
Is there any way to do that without killing job one by one?
Thank you for your help.
EDIT
My question was asked because one user was submitting unwilling jobs. I wanted to kill them all while I was changing security settings (set up a firewall and blocked everything from outside).
I had to use indeed a workaround to kill running jobs while I was setting the network with a script based on yarn application -kill , yarn application -list | grep "dr.who"and awk. That script is certainly not a good solution.
for i in `yarn application -list | grep -w dr.who | grep -E -o application_[0-9,_]*`; do yarn application -kill $i; done
Related
I'm currently moving an application off of static EC2 servers to ECS, as until now the release process has been ssh'ing into the server to git pull/migrate the database.
I've created everything I need using terraform to deploy my code from my organisations' Elastic Container Registry. I have a cluster, some services and task definitions.
I can deploy the app successfully for any given version now, however my main problem is finding a way to run migrations.
My approach so far has been to split the application into 3 services, I have my 'web' service which handles all HTTP traffic (serving the frontend, responding to API requests), my 'cron' service which handles things like sending emails/push notifications on specific times/events and my 'migrate' service which is just the 'cron' service but with the entryPoint to the container overwritten to just run the migrations (as I don't need any of the apache2 stuff for this container, and I didn't see reason to make another one for just migrations).
The problem I had with this was the 'migrate' service would constantly try and schedule more tasks for migrating the database, even though it only needed to be done once. So I've scrapped it as a service and kept it as a task definition however, so that I can still place it into my cluster.
As part of the deploy process I'm writing, I run that task inside the cluster via a bash script so I can wait until the migrations finish before deciding whether to take the application out of maintenance mode (if the migrations fail) or to deploy the new 'web'/'cron' containers once the migration has been completed.
Currently this is inside a shell script (ran by Github actions) that looks like this:
#!/usr/bin/env bash
CLUSTER_NAME=$1
echo $CLUSTER_NAME
OUTPUT=`aws ecs run-task --cluster ${CLUSTER_NAME} --task-definition saas-app-migrate`
if [$? -n 0]; then
>&2 echo $OUTPUT
exit 1
fi
TASKS=`echo $OUTPUT | jq '.tasks[].taskArn' | jq #sh | sed -e "s/'//g" | sed -e 's/"//g'`
for task in $TASKS
do
# check for task to be done
done
Because $TASKS contains the taskArn of any tasks that have been spawned by this, I am freely able to query the task however I don't know what information I'm looking for.
The AWS documentation says I should use the 'describe-task' command to then find out why a task has reached the 'STOPPED' status, as it provides a 'stopCode' and 'stoppedReason' property in the response. However, it doesn't say what these values would be if it was succesfully stopped? I don't want to have to introduce a manual step in my deployment where I wait until the migrations are done - with the application not being usable - to then tell my release process to continue.
Is there a link to documentation I might have missed with the values I'm searching for, or an alternate way to handle this case?
I have a few Docker containers running on EC2 instances in AWS. In the past I have had situations where the Docker containers simply exit due to errors on the docker daemon, and they never start up even though the restart policies are in place (daemon is not running so I don't expect them to get up of course).
Since I am going on holiday I want to implement a quick and easy solution that would allow me to be notified if any containers have exited unexpectedly. The only quick solution I could find was using an Amazon Event Bridge rule for running a scheduled task every X minutes and executing a Systems Manager RunDockerAction command (docker ps) on the instances, but this does not give me any output except for the fact that the command has successfully executed on the instance.
Is there any way that I can get the output of such an Event Bridge task to send the results over an SNS topic if things go wrong?
IF you are running Linux on your AWS EC2 instance, then one solution is to use e-mail as a notification system. In that case, I would suggest the following:
On the AWS EC2 instance, create a Bash script that runs docker ps -a and combine that with a grep statement to filter on the docker container IDs that you want to monitor.
In the same Bash script, using echo and mail, you can e-mail yourself with statistics seen in the previous step. For example"
echo "${container} is not running" | mail -s "Alert! Docker container ${container} is not running!" "first.last#domain.com"
(The above relies on $container to be set appropriately. Use grep to filter out data of interest.)
Create a system crontab job (etc/crontab) and schedule the Bash script to run at your wanted interval.
This is only one possible solution, one that I use myself for quick checks at times.
I have a bash script. I would like to run it continuously on google cloud server. I connected to my VM via SSH in browser but after I've closed my browser, script was stopped.
I tried to use Cloud Shell but if I restart my laptop, script launches from start. It doesn't work continuously!
Is it possible to launch my script in google cloud, shut down laptop and be sure what my script works?
The solution: GNU screen. This awesome little tool let's you run a process after you've ssh'ed into your remote server, and then detach from it - leaving it running like it would run in the foreground (not stopped in the background).
So after we've ssh'ed into our GCE VM, we will need to:
1. install GNU screen:
apt-get update
apt-get upgrade
apt-get install screen
type "screen". this will open up a new screen - kind of similar in look & feel to what "clear" would result in.
run the process (e.g.: ./init-dev.sh to fire up a ChicagoBoss erlang server)
type: Ctrl + A, and then Ctrl + D. This will detach your screen session but leave your processes running!
feel free to close the SSH terminal. whenever you feel like it, ssh back into your GCE VM, and type screen -r to resume your previously detached session.
to kill all detached screens, run:
screen -ls | grep pts | cut -d. -f1 | awk '{print $1}' | xargs kill
You have the following options:
1. Task schedules - which involves cron jobs. Check this sample. Via this answer;
2. Using startup scripts.
I performed the following test and it worked for me:
I created an instance in GCE, SSH-d into it and created the following script, myscript.bash:
#!/bin/bash
sleep 15s
echo Hello World > result.txt
and then, ran
$ bash myscript.bash
and immediately closed the browser window holding the SSH session.
I then waited for at least 15 seconds, re-engaged in an SSH connection with the VM in question and ran $ ls and voila:
myscript.bash result.txt
So the script ran even after closing the browser holding the SSH session.
Still, technically, I believe your solution lies with 1. or 2.
You can use
nohup yourscript.sh > output_log_file.log
I faced similar issue. I logged into Virtual Machine through google cloud command on my local machine, tried to exit by closing the terminal, It halted the script running in the instance.
Use command exit to log out of cloud consoles in local machine putty console (twice).
Make sure you have not enabled "PREEMPT INSTANCE" while creating a VM instance.
It will force to close the instance within 24 hours to reduce the costing by a huge difference.
I have a NodeJS project and I solved with pm2
While issuing a new build to update code in workers how do I restart celery workers gracefully?
Edit:
What I intend to do is to something like this.
Worker is running, probably uploading a 100 MB file to S3
A new build comes
Worker code has changes
Build script fires signal to the Worker(s)
Starts new workers with the new code
Worker(s) who got the signal after finishing the existing job exit.
According to https://docs.celeryq.dev/en/stable/userguide/workers.html#restarting-the-worker you can restart a worker by sending a HUP signal
ps auxww | grep celeryd | grep -v "grep" | awk '{print $2}' | xargs kill -HUP
celery multi start 1 -A proj -l info -c4 --pidfile=/var/run/celery/%n.pid
celery multi restart 1 --pidfile=/var/run/celery/%n.pid
http://docs.celeryproject.org/en/latest/userguide/workers.html#restarting-the-worker
If you're going the kill route, pgrep to the rescue:
kill -9 `pgrep -f celeryd`
Mind you, this is not a long-running task and I don't care if it terminates brutally. Just reloading new code during dev. I'd go the restart service route if it was more sensitive.
You can do:
celery multi restart w1 -A your_project -l info # restart workers
Example
You should look at Celery's autoreloading
What should happen to long running tasks? I like it this way: long running tasks should do their job. Don't interrupt them, only new tasks should get the new code.
But this is not possible at the moment: https://groups.google.com/d/msg/celery-users/uTalKMszT2Q/-MHleIY7WaIJ
I have repeatedly tested the -HUP solution using an automated script, but find that about 5% of the time, the worker stops picking up new jobs after being restarted.
A more reliable solution is:
stop <celery_service>
start <celery_service>
which I have used hundreds of times now without any issues.
From within Python, you can run:
import subprocess
service_name = 'celery_service'
for command in ['stop', 'start']:
subprocess.check_call(command + ' ' + service_name, shell=True)
If you're using docker/docker-compose and putting celery into a separate container from the Django container, you can use
docker-compose kill -s HUP celery
, where celery is the container name. The worker will be gracefully restarted and the ongoing task is not brutally stopped.
Tried pkill, kill, celery multi stop, celery multi restart, docker-compose restart. All not working. Either the container is stopped abruptly or the code is not reloaded.
I just want to reload my code in the prod server manually with a 1-liner. Don't want to play with daemonization.
Might be late to the party. I use:
sudo systemctl stop celery
sudo systemctl start celery
sudo systemctl status celery
I have a script that takes a lot of time to complete.
Instead of waiting for it to finish, I'd rather just log out and retrieve its output later on.
I've tried;
at -m -t 03030205 -f /path/to/./thescript.pl
nohup /path/to/./thescript.pl &
And I have also verified that the processes actually exist with ps and at -l depending on which scheduling syntax i used.
Both these processes die when I exit out of the shell. Is there a way to keep a script from terminating when I close the connection?
We have crons here and they are set up and are working properly, but I would like to use at or nohup for single-use scripts.
Is there something wrong with my syntax? Are there any other methods to producing the desired outcome?
EDIT:
I cannot use screen or disown - they aren't installed in my HP Unix setup and i am not in the position to install them either
Use screen. It creates a terminal which keeps going when you log out. When you log back in you can switch back to it.
If you want to keep a process running after you log out:
disown -h <pid>
is a useful bash built-in. Unlike nohup, you can run disown on an already-running process.
First, stop your job with control-Z, get the pid from ps (or use echo $!), use bg to send it to the background, then use disown with the -h flag.
Don't forget to background your job or it will be killed when you logout.
This is just a guess, but something I've seen with some versions of ssh and nohup: if you've logged in with ssh then you may need to need to redirect stdout, stderr and stdin to avoid having the session hang when you exit. (One of those may still be attached to the terminal.) I would try:
nohup /path/to/./thescript.pl > whatever.stdout 2> whatever.stderr < /dev/null &
(This is no longer the case with my current versions of ssh and nohup - the latter redirects them if it detects that any is attached to a terminal - but you may be using different versions.)
Syntax for nohup looks ok, but your account may not allow for processes to run after logout. Also, try redirecting the stdout/stderr to a log file or /dev/null.
Run your command in background.
/path/to/./thescript.pl &
To get lits of your background jobs
jobs
Now you can selectively disown any of the above jobs, with its jobid.
disown <jobid>
All the disowned process should be keep on running even after you logged out.