My goal is to add the sidekiq service to upstart on AmazonLinux 2018.03.
Since I want to upgrade sidekiq to version 6, There are needs to manage the process by OS like upstart.
I put a file to /etc/init/sidekiq.conf from here.
After that, initctl list | grep sidekiq command shows nothing, so I tried sudo initctl reload-configuration, but nothing changed.
status sidekiq command shows status: Unknown job: sidekiq.
What else do I need to do to add the sidekiq service to upstart?
I am running Django, Celery, and RabbitMQ in a Docker container.
It's all configured well and running, however when I am trying to install django-celery-beat I have a problem initialising the service.
Specifically, this command:
celery -A project beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler
Results in this error:
celery.platforms.LockFailed: [Errno 13] Permission denied: '/usr/src/app/celerybeat.pid'
When looking at causes/solutions, the permission denied error appears to occur when the default scheduler (celery.beat.PersistentScheduler) attempts to track of the last run times in a local shelve database file and doesn't have write access.
However, I am using django-celery-beat and appying the --scheduler flag to use the django_celery_beat.schedulers service that should store the schedule in the Django database and therefore not require write access.
What else could be causing this problem? / How can I debug this further?
celerybeat (celery.bin.beat) creates a pid file where it stores process id
--pidfile
File used to store the process pid. Defaults to celerybeat.pid.
The program won’t start if this file already exists and the pid is
still alive.
You can leave --pidfile= as empty in your command but beware then it will not know if there is more than one celerybeat process active
(Docker container on AWS-ECS exits before all the logs are printed to CloudWatch Logs)
Why are some streams of a CloudWatch Logs Group incomplete (i.e., the Fargate Docker Container exits successfully but the logs stop being updated abruptly)? Seeing this intermittently, in almost all log groups, however, not on every log stream/task run. I'm running on version 1.3.0
Description:
A Dockerfile runs node.js or Python scripts using the CMD command.
These are not servers/long-running processes, and my use case requires the containers to exit when the task completes.
Sample Dockerfile:
FROM node:6
WORKDIR /path/to/app/
COPY package*.json ./
RUN npm install
COPY . .
CMD [ "node", "run-this-script.js" ]
All the logs are printed correctly to my terminal's stdout/stderr when this command is run on the terminal locally with docker run.
To run these as ECS Tasks on Fargate, the log driver for is set as awslogs from a CloudFormation Template.
...
LogConfiguration:
LogDriver: 'awslogs'
Options:
awslogs-group: !Sub '/ecs/ecs-tasks-${TaskName}'
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: ecs
...
Seeing that sometimes the cloduwatch logs output is incomplete, I have run tests and checked every limit from CW Logs Limits and am certain the problem is not there.
I initially thought this is an issue with node js exiting asynchronously before console.log() is flushed, or that the process is exiting too soon, but the same problem occurs when i use a different language as well - which makes me believe this is not an issue with the code, but rather with cloudwatch specifically.
Inducing delays in the code by adding a sleep timer has not worked for me.
It's possible that since the docker container exits immediately after the task is completed, the logs don't get enough time to be written over to CWLogs, but there must be a way to ensure that this doesn't happen?
sample logs:
incomplete stream:
{ "message": "configs to run", "data": {"dailyConfigs":"filename.json"]}}
running for filename
completed log stream:
{ "message": "configs to run", "data": {"dailyConfigs":"filename.json"]}}
running for filename
stdout: entered query_script
... <more log lines>
stderr:
real 0m23.394s
user 0m0.008s
sys 0m0.004s
(node:1) DeprecationWarning: PG.end is deprecated - please see the upgrade guide at https://node-postgres.com/guides/upgrading
UPDATE: This now appears to be fixed, so there is no need to implement the workaround described below
I've seen the same behaviour when using ECS Fargate containers to run Python scripts - and had the same resulting frustration!
I think it's due to CloudWatch Logs Agent publishing log events in batches:
How are log events batched?
A batch becomes full and is published when any of the following conditions are met:
The buffer_duration amount of time has passed since the first log event was added.
Less than batch_size of log events have been accumulated but adding the new log event exceeds the batch_size.
The number of log events has reached batch_count.
Log events from the batch don't span more than 24 hours, but adding the new log event exceeds the 24 hours constraint.
(Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html)
So a possible explanation is that log events are buffered by the agent but not yet published when the ECS task is stopped. (And if so, that seems like an ECS issue - any AWS ECS engineers willing to give their perspective on this...?)
There doesn't seem to be a direct way to ensure the logs are published, but it does suggest one could wait at least buffer_duration seconds (by default, 5 seconds), and any prior logs should be published.
With a bit of testing that I'll describe below, here's a workaround I landed on. A shell script run_then_wait.sh wraps the command to trigger the Python script, to add a sleep after the script completes.
Dockerfile
FROM python:3.7-alpine
ADD run_then_wait.sh .
ADD main.py .
# The original command
# ENTRYPOINT ["python", "main.py"]
# To run the original command and then wait
ENTRYPOINT ["sh", "run_then_wait.sh", "python", "main.py"]
run_then_wait.sh
#!/bin/sh
set -e
# Wait 10 seconds on exit: twice the `buffer_duration` default of 5 seconds
trap 'echo "Waiting for logs to flush to CloudWatch Logs..."; sleep 10' EXIT
# Run the given command
"$#"
main.py
import logging
import time
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()
if __name__ == "__main__":
# After testing some random values, had most luck to induce the
# issue by sleeping 9 seconds here; would occur ~30% of the time
time.sleep(9)
logger.info("Hello world")
Hopefully the approach can be adapted to your situation. You could also implement the sleep inside your script, but it can be trickier to ensure it happens regardless of how it terminates.
It's hard to prove that the proposed explanation is accurate, so I used the above code to test whether the workaround was effective. The test was the original command vs. with run_then_wait.sh, 30 runs each. The results were that the issue was observed 30% of the time, vs 0% of the time, respectively. Hope this is similarly effective for you!
Just contacted AWS support about this issue and here is their response:
...
Based on that case, I can see that this occurs for containers in a
Fargate Task that exit quickly after outputting to stdout/stderr. It
seems to be related to how the awslogs driver works, and how Docker in
Fargate communicates to the CW endpoint.
Looking at our internal tickets for the same, I can see that our
service team are still working to get a permanent resolution for this
reported bug. Unfortunately, there is no ETA shared for when the fix
will be deployed. However, I've taken this opportunity to add this
case to the internal ticket to inform the team of the similar and try
to expedite the process
In the meantime, this can be avoided by extending the lifetime of the
exiting container by adding a delay (~>10 seconds) between the logging
output of the application and the exit of the process (exit of the
container).
...
Update:
Contacted AWS around August 1st, 2019, they say this issue has been fixed.
I observed this as well. It must be an ECS bug?
My workaround (Python 3.7):
import atexit
from time import sleep
atexit.register(finalizer)
def finalizer():
logger.info("All tasks have finished. Exiting.")
# Workaround:
# Fargate will exit and final batch of CloudWatch logs will be lost
sleep(10)
I had the same problem with flushing logs to CloudWatch.
Following asavoy's answer I switched from exec form to shell form of the ENTRYPOINT and added a 10 sec sleep at the end.
Before:
ENTRYPOINT ["java","-jar","/app.jar"]
After:
ENTRYPOINT java -jar /app.jar; sleep 10
I have a rails 4 openshift application. I am trying to run a cron job. The script runs completely fine when I run it by itself. The script is:
#!/bin/bash
/bin/bash -l -c 'cd $OPENSHIFT_REPO_DIR && bundle exec bin/rails runner -e production "Payment.charge_customers_pay_experts"'
The problem is the log file gives me the following error
Wed Feb 3 22:57:05 EST 2016: START minutely cron run
__________________________________________________________________________
/var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo//.openshift/åcron/minutely/charge_customers_pay_experts:
Warning: You're using Rubygems 2.0.14 with Spring. Upgrade to at least Rubygems 2.1.0 and run `gem pristine --all` for better startup performance.
/var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/sid.rb:39:in `getpgid': Permission denied (Errno::EACCES)
from /var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/sid.rb:39:in `pgid'
from /var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/server.rb:78:in `set_pgid'
from /var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/server.rb:34:in `boot'
from /var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/server.rb:14:in `boot'
from -e:1:in `<main>'
__________________________________________________________________________
Wed Feb 3 22:57:06 EST 2016: END minutely cron run - status=0
__________________________________________________________________________
I have made sure the script was executable. I'm not sure if I am missing something. Does anyone have any thoughts?
I don't know that the script being executable necessarily has anything to do with this. It looks like a permissions error more than anything. Does the system user that runs the cron job have the correct permissions to run? You can test this by logging into that user (or sudo su - <user>) and then execute the command in the script manually.
/bin/bash -l -c 'cd $OPENSHIFT_REPO_DIR && bundle exec bin/rails runner -e production "Payment.charge_customers_pay_experts"'
Be sure to replace your $OPENSHIFT_REPO_DIR variable with the correct path to your OpenShift repo directory.
You may just need to either add the user your cronjob runs as to the group that has permissions over the files, or perhaps run the cronjob as a more privileged user (privileged in that it has permissions over the required files).
BTW, I could only post this as an answer as Stack Overflow is telling me I need 50 reputation points to comment.
I fixed this by commenting out the 'spring' gem in my gemfile. But apparently this is a known issue. https://bugzilla.redhat.com/show_bug.cgi?id=1305544.
There is a workaround for the time being until this issue is resolved. You can edit the /usr/libexec/openshift/cartridges/cron/bin/cron_runjobs.sh to add setsid in front of timeout so that it runs setsid timeout ... as this allows for the timeout command to actually change the sid.
Using puma on a rails app; it sometimes dies without any articular reason; also often dies (does not restart after being stopped) when deployed
What would be a good way to monitor if the process died, and restart it right way ?
Being called within a rails app; I'd be useful to have a way to defines it for any apps.
I did not found any useable ways to do it (looked into systemd, other linux daemons… no success)
Thanks if any feedback
You can use puma control to start/stop puma server. If you know where puma.pid file placed (for Mac it's usually "#{Dir.pwd}/tmp/pids/puma.pid") you could do:
bundle exec pumactl -P path/puma.pid stop
To set pid file path or to other options (like daemonizing) you could create puma config. You can found an example here. And then start and stop server just with config file:
bundle exec pumactl -F config/puma.rb start
You can also restart and check status in this way:
bundle exec pumactl -F config/puma.rb restart
bundle exec pumactl -F config/puma.rb status