How to do YARN role graceful shutdown on a Cloudera Manager datanode CDH 6.3.2 - mapreduce

Can not find answer on this question.
How to gracefully stop YARN role on a data node and wait till all running jobs on a datanode will finish with status success.
I know that in ClouderaManager you can decommission yarn role when you can stop it.
If I do YARN role decommission
The running jobs will fail with exit code killed or crash status.
Is this a safe way to YARN role stop on a data node?
Is this a graceful yarn role shutdown or where is other way to do this?
all jobs have killed status after YARN role decommission

This is documented poorly on Apache website for hadoop 3.3:
Create an XML file with NodeManagers you wish to decommission:
<?xml version="1.0"?>
<hosts>
<host><name>host1</name></host> <!-- normal 'kill' -->
<host><name>host2</name><timeout>123</timeout></host> <!-- allows jobs 123 seconds to finish -->
<host><name>host3</name><timeout>-1</timeout></host><!-- allows jobs infinite seconds to finish -->
</hosts>
Update your config(yarn-site.xml) to point to this file (No restart required)
yarn.resourcemanager.nodes.exclude-path=[path/to/exculd/file]
run update: (initiate decomission)
yarn rmadmin -refreshNodes
Alternatively you could set a graceful timeout for all nodes:
yarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs
Alternatively you manually set a graceful timeout:
yarn rmadmin -refreshNodes -g [timeout in seconds] -client

YARN Graceful decommission will wait for jobs to complete. You can pass the timeout value so that YARN will start decommission after x seconds. If no jobs running within x secs then automatically YARN will start decommission without waiting for timeout to happen.
CM -> Clusters -> yarn -> Configuration -> In search bar (
yarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs)
Set the value and save the configuration and do restart to deploy configs.
To decommission a specific host/more hosts
CM -> Clusters -> yarn -> Instances (Select the hosts that you want to decommission)
Click -> Actions for selected hosts -> Decommission
In case you want to decommission all the roles of a host then follow this doc
https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_mc_host_maint.html#decomm_host

Related

How to notify/alert when the SSH/RDP service is not running in GCE VM

GCP Operation's suite doesnot have a pre built metric to monitor the SSH (Linux) / RDP (Windows) service running in the GCE VM. Also I have tried to find it in GCP Logging but couldn't find any logs about the service running in the OS.
Is there any other way to monitor the health of SSH / RDP Service in GCE VM's ? And to forward an alert if it is not in running state.
I updated my answer:
You can use Cloud Logging to monitor status of SSH daemon and connections to it on your VM instances:
install logging agent that will send logs to Cloud Logging (formerly Stackdriver):
curl -sSO https://dl.google.com/cloudagents/add-logging-agent-repo.sh
sudo bash add-logging-agent-repo.sh
sudo apt-get update
sudo apt-get install google-fluentd
sudo apt-get install -y google-fluentd-catch-all-config
edit logging agent configuration located at /etc/google-fluentd/config.d/syslog.conf to collect entries from auth.log by adding:
<source>
type tail
# Parse the timestamp, but still collect the entire line as 'message'
format /^(?<message>(?<time>[^ ]*\s*[^ ]* [^ ]*) .*)$/
path /var/log/auth.log
pos_file /var/lib/google-fluentd/pos/auth.log.pos
read_from_head false
tag auth
</source>
restart google-fluentd to apply updated configuration:
sudo service google-fluentd restart
go to Cloud Console -> Logging -> Logs Explorer and look for events related to SSH daemon:
configure alerts; in addition, have a look at the 3rd party examples here and
here.
You can use same approach for checking and monitoring RDP on Windows:
install logging agent
edit logging agent configuration located at C:\Program Files (x86)\Stackdriver\LoggingAgent\fluent.conf
restart logging agent
go to Cloud Console -> Logging -> Logs Explorer and look for events related to RDP service.
configure alerts
EDIT
In order for the incidents to not close automatically, you can create two metrics:
1st logs termination signal and 2nd that logs SSH startups.
Then you should create an alert that fires when the first one is above zero AND second one is below one.
In that way, incident will last until sshd is started again.

How to increase gitlab job concurrency on gke kubernetes runner?

I created two gitlab runners through the standard default runner creation UI in gitlab (3 node n1-standard-4 gke cluster). I've been trying to increase my gitlab runner to handle more than the default 4 concurrent jobs, but for some reason the limit is still capped at running only 4 jobs at once.
In GCP I changed the concurrent value in the config.toml from 4 to 20 under the config maps runner-gitlab-runner and values-content-configuration-runner that were generated in my cluster found under the https://console.cloud.google.com/kubernetes/config menu.
What else do I need to change to allow my gitlab runners to run more than 4 jobs at once?
Do I need to change the limit in the runner options? If so, where would I find that config in GCP?
Changing the config maps in GCP will not immediately update the cluster. You need to manually reload the deployment with:
kubectl rollout restart deployment/runner-gitlab-runner -n gitlab-managed-apps
There is also a button in the GCP cluster menu that will pull up a terminal with the kubeconfig of your cluster in a browser window.

How to set leader attribute to an AWS Beanstalk instance?

I am having the below configuration in AWS Beanstalk
Environment type: Load balanced, auto scaling
Number instances: 1 - 4
When a new instance is created, crontab is added for the new instance. So duplication of crons are executing. How can I set the crontab to rub only in one instance?
I am using .ebextensions in my project.
You can't specify which instance is assigned the leader flag; there is an election process that determines which instance "wins".
That being said, you can use the leader_only flag in your .ebextensions/crontab.config file when you create the crontab. It might look something like this:
container_commands:
01_create_crontab:
command: "> /etc/cron.d/mycrontab && chmod 644 /etc/cron.d/mycrontab"
leader_only: true
Rather than relying on setting up cronjobs on deploy, which is unreliable (AWS may change 'leaders' after deploy), you should think about checking or setting the leader instance in the actual job code.
See https://github.com/dignoe/whenever-elasticbeanstalk/blob/master/bin/ensure_one_cron_leader , it's Ruby, not PHP, but the code is readable.
Deploy the crontab definition to all instances, but when the job is triggered:
* use distributed Mutex lock (either Redis/ElastiCache or database backed) - only one worker will be able to pass
* OR, may be more complicated to code: check on all instances if they are leaders, if there is none, set one, and then continue the job only on the leader.
Alternatively you'd have to switch to using Worker Tier or SNS/SQS systems. Set up the schedule in cron.yml, define a POST endpoint that accepts request from localhost only (there will be a local daemon process running on the worker server, that listens for SQS messages and triggers jobs => POST). Not an ideal solution but it's a trade-off, there are other benefits of AWS EB platform.

How do I get Spark installed on Amazon's EMR core/worker nodes/instances while creating the cluster?

I am trying to launch a EMR cluster with Spark (1.6.0) and Hadoop (Distribution: Amazon 2.7.1) applications. The release label is emr-4.4.0.
The cluster gets setup as needed but it doesn't run Spark master (in the master instances) as a daemon process and also I cannot find Spark being installed in the worker (core) instances (the Spark dir under /usr/lib/ has just lib and yarn directories).
I'd like to run the Spark master and worker nodes as soon as the cluster has been setup. (i.e., workers connect to the master automatically and become a part of the Spark cluster).
How do I achieve this? Or am I missing something?
Thanks in advance!
Spark on EMR is installed in YARN mode. This is the reason why you are not able to see standalone masters and slave daemons. http://spark.apache.org/docs/latest/running-on-yarn.html
Standalone Spark master and worker daemons are spawned only in spark-standalone mode. http://spark.apache.org/docs/latest/spark-standalone.html
Now, if you do want to run spark masters and workers on EMR, you can do so using
/usr/lib/spark/sbin/start-master.sh
/usr/lib/spark/sbin/start-slave.sh
and configuring accordingly.

WSO2 Cluster configuration deployment synchronization filter

In a cluster configuration every modification is propagated from the manager to the worker. I have a scheduled task which have to run just on the manager.
What's the way i can stop the synchronization with the workers?
Deployment synchronization does not happen as a schedule task .. once an artifact is uploaded to a manager node, manager sends a cluster message to all other workers specifying the repo is updated. Once the workers get the message it will update their repo's . So to if you want to disable worker's dep-sync then you need to disable DeploymentSynchronizer in carbon.xml(repository/conf/)
<DeploymentSynchronizer>
<Enabled>false</Enabled>
...
</DeploymentSynchronizer>
Please refer this for more details.