I have many EC2 instances that retain Celery jobs for processing. To efficiently start the overall task of completing the queue, I have tested AWS-RunBashScript in AWS' SSM with a BASH script that calls a Python script. For example, for a single instance this begins with sh start_celery.sh.
When I run the command in SSM, this is the following output (compare to other output below, after reading on):
/home/ec2-user/dh2o-py/venv/local/lib/python2.7/dist-packages/celery/utils/imports.py:167:
UserWarning: Cannot load celery.commands extension u'flower.command:FlowerCommand':
ImportError('No module named compat',)
namespace, class_name, exc))
/home/ec2-user/dh2o-py/tasks/task_harness.py:49: YAMLLoadWarning: calling yaml.load() without
Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
task_configs = yaml.load(conf)
Running a worker with superuser privileges when the worker accepts messages serialized with pickle is a very bad idea!
If you really want to continue then you have to set the C_FORCE_ROOT
environment variable (but please think about this before you do).
User information: uid=0 euid=0 gid=0 egid=0
failed to run commands: exit status 1
Note that only warnings are thrown. When I SSH to the same instance and run the same command (i.e. sh start_celery.sh), the following (same) output results BUT the process runs:
I have verified that the process does NOT run when doing this via SSM, and I have no idea why. As a work-around, I tried running the sh start_celery.sh command with bootstrapping in user data for each EC2, but that failed too.
So, why does SSM fail to actually run the process that I succeed in doing by actually via SSH to each instance running identical commands? The details below relate to machine and Python configuration:
Related
I have a Django app that is deployed on kubernetes. The container also has a mount to a persistent volume containing some files that are needed for operation. I want to have a check that will check that the files are there and accessible during runtime everytime a pod starts. The Django documentation recommends against running checks in production (the app runs in uwsgi), and because the files are only available in the production environment, the check will fail when unit tested.
What would be an acceptable process for executing the checks in production?
This is a community wiki answer posted for better visibility. Feel free to expand it.
Your use case can be addressed from Kubernetes perspective. All you have to do is to use the Startup probes:
The kubelet uses startup probes to know when a container application
has started. If such a probe is configured, it disables liveness and
readiness checks until it succeeds, making sure those probes don't
interfere with the application startup. This can be used to adopt
liveness checks on slow starting containers, avoiding them getting
killed by the kubelet before they are up and running.
With it you can use the ExecAction that would execute a specified command inside the container. The diagnostic would be considered successful if the command exits with a status code of 0. An example of a simple command check could be one that checks if a particular file exists:
exec:
command:
- stat
- /file_directory/file_name.txt
You could also use a shell script but remember that:
Command is the command line to execute inside the container, the
working directory for the command is root ('/') in the container's
filesystem. The command is simply exec'd, it is not run inside a
shell, so traditional shell instructions ('|', etc) won't work. To use
a shell, you need to explicitly call out to that shell.
I am running Django, Celery, and RabbitMQ in a Docker container.
It's all configured well and running, however when I am trying to install django-celery-beat I have a problem initialising the service.
Specifically, this command:
celery -A project beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler
Results in this error:
celery.platforms.LockFailed: [Errno 13] Permission denied: '/usr/src/app/celerybeat.pid'
When looking at causes/solutions, the permission denied error appears to occur when the default scheduler (celery.beat.PersistentScheduler) attempts to track of the last run times in a local shelve database file and doesn't have write access.
However, I am using django-celery-beat and appying the --scheduler flag to use the django_celery_beat.schedulers service that should store the schedule in the Django database and therefore not require write access.
What else could be causing this problem? / How can I debug this further?
celerybeat (celery.bin.beat) creates a pid file where it stores process id
--pidfile
File used to store the process pid. Defaults to celerybeat.pid.
The program won’t start if this file already exists and the pid is
still alive.
You can leave --pidfile= as empty in your command but beware then it will not know if there is more than one celerybeat process active
In Informatica, I can start a workflow but cannot get it to recognize my instance name in the session log and Workflow Monitor.
The workflow starts but in the session log it displays this:
Workflow wf_Tenter image description hereemp started with run id [22350], run instance name [], run type [Concurrent Run with Un[enter image description here][1]ique Instance Name]
Instance name is blank.
My command is:
pmcmd startworkflow -sv <service> -d <domain> -u <user> -p <password> -f <folder> -rin INST1 -paramfile <full param file path name> wf_Temp
I have edited the workflow and selected the checkbox Configure Current Execution. Inside Configure Concurrent Execution button, I have created three instances: INST1, INST2, INST3, but without any associated parameter files. All parameter files are blank.
I understand, I think, that in order to start a workflow with PMCMD I must pass in one of the configured instance names (i.e. INST1, INST2, INST3, etc.)
If I execute the PMCMD command from Putty a second time to see the second instance run, I receive a message that workflow is still running and I have to wait? Why? I have checked the Concurrent Workflow box in the workflow.
ERROR: Workflow [wf_Temp]: Could not start execution of this workflow because the current run on this Integration Service has not completed yet.
Disconnecting from Integration Service
So, I think I'm close, but am missing something. The workflow runs with the parameter file I pass in PMCMD but the instance name seems to be ignored.
Further. Do, I have to pre-configure instance names in the Workflow manager? Is the PMCMD instance and parameter file parameters enough? It doesn't seem quite so dynamic if Instances have to be pre-defined in the workflows.
Thanks.
#MacieJG
Here's the screenshots from Putty when I run the command. You can see the instance name DALLAS is being passed through the PMCMD OK. No combination ever gets the Instance name. I did not include the pics of your suggested Test 1, but results were same.. still no instance.
Here's my complete test as requested in a comment above. I tried my best to put everything you may need here, but if I missed anything, just let me know. So here goes...
I've created a very simple workflow to run with instance name. It uses a timer to wait and a command tast to write the instance name to a file:
The concurrent execution has been set up in the most simple way:
Now, I've prepared the followig batch to run the workflow (just user & password removed):
SET "PMCMD=C:\Informatica\9.5.1\clients\PowerCenterClient\CommandLineUtilities\PC\server\bin\pmcmd"
%PMCMD% startworkflow -sv Dev_IS -d Domain_vic-vpc -u ####### -p ####### -f Dev01 -rin GLASGOW wf_Instance_Test
%PMCMD% startworkflow -sv Dev_IS -d Domain_vic-vpc -u ####### -p ####### -f Dev01 -rin FRANKFURT wf_Instance_Test
%PMCMD% startworkflow -sv Dev_IS -d Domain_vic-vpc -u ####### -p ####### -f Dev01 -rin GLASGOW wf_Instance_Test
It runs three instances, two of them with same name, just to test it. I run the batch the following way to capture the output:
pmStartTestWF.bat > c:\MG\pmStartTestWF.log
Once I execute it, here what I see in workflow monitor:
Just as expected, three instances executed and properly displayed. File output looks fine as well:
The output of pmcmd can be found here. Full definition of my test workflow is available here.
I really hope this will help you somehow. Feel free to let me know if you'd find anything missing here. Good luck!
You don't need to pre-configure instance names in workflow. Passing the instance name in pmcmd along with parameter filename is enough.
try this: pmcmd startworkflow -sv (service) -d (domain) -u (user) -p (password) -f (folder) -paramfile (full param file path name) -rin INST1 wf_Temp
To be precise: when you configure Concurrent Execution, you can specify if you:
allow concurrent run with same instance name
allow concurrent run only with unique instance name
In addition to that you may, but don't have to, indicate which instance should use which parameter file, so it won't be need to mention it while executing. But that's a separate feature.
Now, if you've chosen the first one, you will be able to invoke the WF multiple times with the very same command. If you've chosen the second one and try this, you will get the 'WF is already running' error.
The trouble is that your example seems correct at first glance. As per the log message:
Workflow wf_Temp started with run id [22350], run instance name [], run type [Concurrent Run with Unique Instance Name]
So you're allowing unique instances only. It seems that the instance name has not been used. First execution does not set the instance name, so similar second execution won't use it either and will get rejected as this is the same instance name (i.e. None).
You may try to change the setting to Allow concurrent run with same instance name, this shall allow the secon execution, but does not solve the main issue. For some reason the instance name does not get passed.
Please verify your command against the docs referenced below. Try to match the order perhaps. Please share some more info if it still fails.
Looking at the docs:
pmcmd StartWorkflow
<<-service|-sv> service [<-domain|-d> domain] [<-timeout|-t> timeout]>
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<<-usersecuritydomain|-usd> usersecuritydomain|<-usersecuritydomainvar|-usdv>
userSecuritydomainEnvVar>]
[<-folder|-f> folder]
[<-startfrom> taskInstancePath]
[<-recovery|-norecovery>]
[<-paramfile> paramfile]
[<-localparamfile|-lpf> localparamfile]
[<-osprofile|-o> OSUser]
[-wait|-nowait]
[<-runinsname|-rin> runInsName]
workflow
I have a standard Debian 8.9 instance on google cloud compute (GCE) where my startup script is ignored.
In the custom metadata field, for startup-script, I am trying to run an Rscript (which is used for batch execution of R files), followed by a system shutdown, with the following:
#! /bin/bash
sudo /usr/bin/Rscript /home/myuser/launch_script.R
sudo shutdown -h now
Starting the instance is immediately followed by a shutdown and the Rscript is ignored. Removing the last line to shutdown causes the GCE instance to start, but the Rscript to be ignored. Running just "sudo /usr/bin/Rscript /home/myuser/launch_script.R" from the terminal results in the script being run. It has a chmod of 755, so I don't think this is a permissions issue.
In addition to this problem, I have read elsewhere that logging should happen in /var/log/, but there is nothing there. Instead, I have a bunch of log files (that only contain the start-up script and nothing else) in the root of my instance:
I got in touch with Google cloud support, who gave the following response:
script definition is kept under /var/run/google.startup.script
If the script does not run initially, you can force it manually with : $ sudo google_metadata_script_runner --script-type startup # for Debian, or # sudo /usr/share/google/run-startup-scripts # on Ubuntu and older images
I'm posting this information here, because it is not in their documentation (as of August 2017). I'm not sure how helpful it is, since the google.startup.script didn't exist in my case (using the latest Debian image on GCE), but I did run the other commands.
However, I think my main issues were:
I was using autossh to connect to a remote database. The startup-script was running before autossh. Building a 40 second delay into the script and running the script as a user (not sudo-type root) seems to have solved this problem for now. Autossh was being run as the main user, which I think gets loaded before lower-privilege user-defined scripts get loaded.
I was using some gcloud commands from the user account which had its own authentication issues. Running gcloud auth login as the user and ensuring correct permissions on my private key solved this.
Always remember to check the messages and syslog files in /var/log for troubleshooting. This allowed me to see the order of things being loaded at system-boot.
I am trying to update source code on an EC2 instance using cfn-hup service in cloud formation (AWS).
When I update the stack with new source code using build number, the source code does changes at EC2.
cfn-hup service running fine and all configurations are OK.
Below are the logs of cfn-hup.
2016-03-05 08:48:19,912 [INFO] Data has changed from previous state; action for cfn-auto-reloader-hook will be run
2016-03-05 08:48:19,912 [INFO] Running action for cfn-auto-reloader-hook
2016-03-05 08:48:20,191 [WARNING] Action for cfn-auto-reloader-hook exited with 1; will retry on next iteration
Can anyone plz help me on this.
The error states Action for cfn-auto-reloader-hook exited with 1. This means that the action specified in your cfn-auto-reloader-hook has been executed, but returned an error code of 1 indicating a failure state. The good news is that everything else is set up correctly (the cfn-hup script is installed and running, it correctly detected a metadata change, and it found the cfn-auto-reloader hook).
Look at the action= line in your cfn-hup entry for this hook. A typical hook will look something like this:
[cfn-auto-reloader-hook]
triggers=post.update
path=Resources.WebServerInstance.Metadata.AWS::CloudFormation::Init
action=some_shell_command_here
runas=root
To find the hook, run cat /etc/cfn/hooks.d/cfn-auto-reloader.conf on the instance, or trace back where these file contents are defined in your CloudFormation template (e.g., in the example LAMP stack, this hook is created by the files section of an AWS::CloudFormation::Init Metadata Resource, used by the cfn-init helper script). Try manually executing the line in a local shell. If it fails, use the relevant output or error logs to continue debugging. Change the command and cfn-hup should succeed the next time it runs.
This means one of your cfn-init items is failing. If it works the first time, it's likely that you have a commands section item that needs a "test" clause to determine if it needs to run or not.