How can you check previous yarn ApplicationId? - mapreduce

Let's say I want to check the yarn logs with the command "yarn logs" but I can't access to the ApplicationID of a MapReduce job neither through the output or the spark context of the code. How can I check the last Application ID's that have been executed?

To get the list of all the applications submitted so far, you can use the following command:
yarn application -list -appStates ALL
You can also filter the applications, based on the state (possible states: NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILEDand KILLED).
For e.g. to get the list of all the "FAILED" applications, you can execute the following command:
yarn application -list -appStates FAILED

Related

Checking for the result of the AWS CLI 'run-task' command, task stopped succesfully or from an error?

I'm currently moving an application off of static EC2 servers to ECS, as until now the release process has been ssh'ing into the server to git pull/migrate the database.
I've created everything I need using terraform to deploy my code from my organisations' Elastic Container Registry. I have a cluster, some services and task definitions.
I can deploy the app successfully for any given version now, however my main problem is finding a way to run migrations.
My approach so far has been to split the application into 3 services, I have my 'web' service which handles all HTTP traffic (serving the frontend, responding to API requests), my 'cron' service which handles things like sending emails/push notifications on specific times/events and my 'migrate' service which is just the 'cron' service but with the entryPoint to the container overwritten to just run the migrations (as I don't need any of the apache2 stuff for this container, and I didn't see reason to make another one for just migrations).
The problem I had with this was the 'migrate' service would constantly try and schedule more tasks for migrating the database, even though it only needed to be done once. So I've scrapped it as a service and kept it as a task definition however, so that I can still place it into my cluster.
As part of the deploy process I'm writing, I run that task inside the cluster via a bash script so I can wait until the migrations finish before deciding whether to take the application out of maintenance mode (if the migrations fail) or to deploy the new 'web'/'cron' containers once the migration has been completed.
Currently this is inside a shell script (ran by Github actions) that looks like this:
#!/usr/bin/env bash
CLUSTER_NAME=$1
echo $CLUSTER_NAME
OUTPUT=`aws ecs run-task --cluster ${CLUSTER_NAME} --task-definition saas-app-migrate`
if [$? -n 0]; then
>&2 echo $OUTPUT
exit 1
fi
TASKS=`echo $OUTPUT | jq '.tasks[].taskArn' | jq #sh | sed -e "s/'//g" | sed -e 's/"//g'`
for task in $TASKS
do
# check for task to be done
done
Because $TASKS contains the taskArn of any tasks that have been spawned by this, I am freely able to query the task however I don't know what information I'm looking for.
The AWS documentation says I should use the 'describe-task' command to then find out why a task has reached the 'STOPPED' status, as it provides a 'stopCode' and 'stoppedReason' property in the response. However, it doesn't say what these values would be if it was succesfully stopped? I don't want to have to introduce a manual step in my deployment where I wait until the migrations are done - with the application not being usable - to then tell my release process to continue.
Is there a link to documentation I might have missed with the values I'm searching for, or an alternate way to handle this case?

AWS-RunBashScript errors/warnings with Python

I have many EC2 instances that retain Celery jobs for processing. To efficiently start the overall task of completing the queue, I have tested AWS-RunBashScript in AWS' SSM with a BASH script that calls a Python script. For example, for a single instance this begins with sh start_celery.sh.
When I run the command in SSM, this is the following output (compare to other output below, after reading on):
/home/ec2-user/dh2o-py/venv/local/lib/python2.7/dist-packages/celery/utils/imports.py:167:
UserWarning: Cannot load celery.commands extension u'flower.command:FlowerCommand':
ImportError('No module named compat',)
namespace, class_name, exc))
/home/ec2-user/dh2o-py/tasks/task_harness.py:49: YAMLLoadWarning: calling yaml.load() without
Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
task_configs = yaml.load(conf)
Running a worker with superuser privileges when the worker accepts messages serialized with pickle is a very bad idea!
If you really want to continue then you have to set the C_FORCE_ROOT
environment variable (but please think about this before you do).
User information: uid=0 euid=0 gid=0 egid=0
failed to run commands: exit status 1
Note that only warnings are thrown. When I SSH to the same instance and run the same command (i.e. sh start_celery.sh), the following (same) output results BUT the process runs:
I have verified that the process does NOT run when doing this via SSM, and I have no idea why. As a work-around, I tried running the sh start_celery.sh command with bootstrapping in user data for each EC2, but that failed too.
So, why does SSM fail to actually run the process that I succeed in doing by actually via SSH to each instance running identical commands? The details below relate to machine and Python configuration:

How to execute mvn clean install in goCD

mvn clean build command does not execute in GoCD , The pipe line gets triggered but there is nothing displayed in logs and the job keeps running forever after setting inactivity time to 1 min.
I have created a pipe line and added mvn clean install command to it as in below image.Please let me know what needs to changed to generate artifacts as first step.
The most important clue is in your first screenshot, it says "Agent: Not yet assigned". That means that no agent (aka worker) could be found that that can handle your job.
Please read the manual on managing agents, specifically the section Matching jobs to agents.
Frequent reasons why no agent can be assigned:
No agents available at all
The agent(s) are in environments, but the pipeline isn't
Mismatch between resources specified in the job and in the agent management.

Google cloud compute startup script ignored with no logging

I have a standard Debian 8.9 instance on google cloud compute (GCE) where my startup script is ignored.
In the custom metadata field, for startup-script, I am trying to run an Rscript (which is used for batch execution of R files), followed by a system shutdown, with the following:
#! /bin/bash
sudo /usr/bin/Rscript /home/myuser/launch_script.R
sudo shutdown -h now
Starting the instance is immediately followed by a shutdown and the Rscript is ignored. Removing the last line to shutdown causes the GCE instance to start, but the Rscript to be ignored. Running just "sudo /usr/bin/Rscript /home/myuser/launch_script.R" from the terminal results in the script being run. It has a chmod of 755, so I don't think this is a permissions issue.
In addition to this problem, I have read elsewhere that logging should happen in /var/log/, but there is nothing there. Instead, I have a bunch of log files (that only contain the start-up script and nothing else) in the root of my instance:
I got in touch with Google cloud support, who gave the following response:
script definition is kept under /var/run/google.startup.script
If the script does not run initially, you can force it manually with : $ sudo google_metadata_script_runner --script-type startup # for Debian, or # sudo /usr/share/google/run-startup-scripts # on Ubuntu and older images
I'm posting this information here, because it is not in their documentation (as of August 2017). I'm not sure how helpful it is, since the google.startup.script didn't exist in my case (using the latest Debian image on GCE), but I did run the other commands.
However, I think my main issues were:
I was using autossh to connect to a remote database. The startup-script was running before autossh. Building a 40 second delay into the script and running the script as a user (not sudo-type root) seems to have solved this problem for now. Autossh was being run as the main user, which I think gets loaded before lower-privilege user-defined scripts get loaded.
I was using some gcloud commands from the user account which had its own authentication issues. Running gcloud auth login as the user and ensuring correct permissions on my private key solved this.
Always remember to check the messages and syslog files in /var/log for troubleshooting. This allowed me to see the order of things being loaded at system-boot.

unable to update source code using cfn-hup in aws

I am trying to update source code on an EC2 instance using cfn-hup service in cloud formation (AWS).
When I update the stack with new source code using build number, the source code does changes at EC2.
cfn-hup service running fine and all configurations are OK.
Below are the logs of cfn-hup.
2016-03-05 08:48:19,912 [INFO] Data has changed from previous state; action for cfn-auto-reloader-hook will be run
2016-03-05 08:48:19,912 [INFO] Running action for cfn-auto-reloader-hook
2016-03-05 08:48:20,191 [WARNING] Action for cfn-auto-reloader-hook exited with 1; will retry on next iteration
Can anyone plz help me on this.
The error states Action for cfn-auto-reloader-hook exited with 1. This means that the action specified in your cfn-auto-reloader-hook has been executed, but returned an error code of 1 indicating a failure state. The good news is that everything else is set up correctly (the cfn-hup script is installed and running, it correctly detected a metadata change, and it found the cfn-auto-reloader hook).
Look at the action= line in your cfn-hup entry for this hook. A typical hook will look something like this:
[cfn-auto-reloader-hook]
triggers=post.update
path=Resources.WebServerInstance.Metadata.AWS::CloudFormation::Init
action=some_shell_command_here
runas=root
To find the hook, run cat /etc/cfn/hooks.d/cfn-auto-reloader.conf on the instance, or trace back where these file contents are defined in your CloudFormation template (e.g., in the example LAMP stack, this hook is created by the files section of an AWS::CloudFormation::Init Metadata Resource, used by the cfn-init helper script). Try manually executing the line in a local shell. If it fails, use the relevant output or error logs to continue debugging. Change the command and cfn-hup should succeed the next time it runs.
This means one of your cfn-init items is failing. If it works the first time, it's likely that you have a commands section item that needs a "test" clause to determine if it needs to run or not.