I've just downloaded jetty 9 and wanted to run it as a daemon. I've set some options in /etc/default/jetty, here they are:
JETTY_HOME=/opt/jetty
JETTY_ARGS=jetty.port=8080
NO_START=0
JETTY_USER=jetty
JETTY_PID=/opt/jetty/jetty.state
JETTY_LOGS=/var/log/jetty
DEBUG=1
When I run service jetty start I get this:
Starting Jetty: FAILED Sun Apr 13 17:20:25 UTC 2014
Well, what could be wrong? There's no info in logs, how can I debug this?
Take a look here for some tips. The default script runs start-stop-daemon with the -b flag, which puts everything into a detached process and prevents output from going to the console. Remove -b and add -v (verbose), and you'll probably get some debugging information and an indication of how far it got.
Related
I am working on a web-based tool (named cloudcopasi) which take jobs from a user and submit it to bosco resources (compute nodes). I am using a bosco version (condor 8.8.12) on Linux CentOS 7. The web interface allows a user to add a bosco pool which user can use to submit jobs. However, when I try to submit a job, it fails. I tried to test the pool as well by using the following command:
bosco_cluster --test
It gives me the following GAHP error:
…..
Testing bosco submission...Passed!
Submission and log files for this job are in /home/cloudcopasi/bosco/local.bosco/bosco-test/boscotest.LTA07r
Waiting for jobmanager to accept job...Passed
Checking for submission to remote slurm cluster (could take ~30 seconds)...Failed
Showing last 5 lines of logs:
01/06/21 13:34:03 [3800] Gahp Server (pid=3815) exited with status 1 unexpectedly
01/06/21 13:34:08 [3800] gahp server not up yet, delaying ping
01/06/21 13:34:08 [3800] No jobs left, shutting down
01/06/21 13:34:08 [3800] Got SIGTERM. Performing graceful shutdown.
01/06/21 13:34:08 [3800] **** condor_gridmanager (condor_GRIDMANAGER) pid 3800 EXITING WITH STATUS 0
I am not sure what I am missing but I don’t understand how to solve this “Gahp server” issue.
Any help is highly appreciated.
Thank you.
This a probably an ssh failure (network, authentication, or authorization). Bosco runs the following command to access the remote cluster submit host:
<sbin>/remote_gahp <user>#<hostname> batch_gahp
You can run it on the command line to get more details about what's going wrong. remote_gahp is a bash script, so you can dig in further, if necessary.
I have a startup script set by an instance template that initializes the server for google compute. After installing postgres, I manually call for it to start using :
/etc/init.d/postgresql start
This completes successfully, but the server is not listening on 5432 when run by the startup script (postgres isn't started, although that service start call completes successfully). After startup completes, and I log in, I can do it successfully. Anyone know why that won't work within the startup script ? I need to load up data during startup so I need to startup postgres during initialization.
I Solved using newer debian image
I had the same problem as you (installing postgresql in GCE startup-script results in the package being installed, but the server is not running), and I think I figured out the root cause.
Normally, the postgresql-11 package is supposed to start the PostgreSQL server after installation. Here is a snippet from its postinst script:
if [ "$1" = configure ]; then
. /usr/share/postgresql-common/maintscripts-functions
configure_version $VERSION "$2"
fi
Taking a look at /usr/share/postgresql-common/maintscripts-functions, we see:
configure_version() {
...
# reload systemd to let the generator pick up the new unit
if [ -d /run/systemd/system ]; then
systemctl daemon-reload
fi
invoke-rc.d postgresql start $VERSION # systemd: argument ignored, starts all versions
}
My debian installation comes with init-system-helpers version "1.56+nmu1", which contains this bit of code in invoke-rc.d:
# avoid deadlocks during bootup and shutdown from units/hooks
# which call "invoke-rc.d service reload" and similar, since
# the synchronous wait plus systemd's normal behaviour of
# transactionally processing all dependencies first easily
# causes dependency loops
if ! systemctl --quiet is-active multi-user.target; then
sctl_args="--job-mode=ignore-dependencies"
fi
case $saction in
start|restart|try-restart)
[ "$_state" != "LoadState=masked" ] || exit 0
systemctl $sctl_args "${saction}" "${UNIT}" && exit 0
;;
The debian postgresql-11 package makes use of templated systemd units. The main one is called postgresql.service but this is a dummy service that doesn't actually do anything. The PostgreSQL server is actually started by a templated unit named postgresql#11-main which is usually started alongside the main service because it has ReloadPropagatedFrom=postgresql.service.
Note that when this issue occurs, the main unit is started but the templated one is not:
$ sudo systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Fri 2021-04-02 05:40:48 UTC; 32min ago
Main PID: 1663 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 4665)
Memory: 0B
CGroup: /system.slice/postgresql.service
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Starting PostgreSQL RDBMS...
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Started PostgreSQL RDBMS.
$ sudo systemctl status postgresql#11-main
● postgresql#11-main.service - PostgreSQL Cluster 11-main
Loaded: loaded (/lib/systemd/system/postgresql#.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead)
That's because when --job-mode=ignore-dependencies is specified, this link is ignored.
The GcE startup script runs as a systemd unit, which starts before multi-user.target is up:
$ find /etc/systemd | grep startup
/etc/systemd/system/multi-user.target.wants/google-startup-scripts.service
Therefore, invoke-rc.d notices that systemctl --quiet is-active multi-user.target is false and adds --job-mode=ignore-dependencies, which results in the PostgreSQL server not starting.
One possible workaround is explicitly running systemd start postgresql#11-main.service from your startup script after installing postgres.
By the way, I noticed that a recent commit (Nov 2020) changed this invoke-rc.d behavior so that it no longer uses --job-mode=ignore-dependencies. That would help avoid this issue.
I'm trying to add my first service on rhel7 (which resides in AWS/EC2), but - the service is not configured correctly - as I get:
[ec2-user#ip-172-30-1-96 ~]$ systemctl status clouddirectd.service -l
● clouddirectd.service - CloudDirect Daemon
Loaded: loaded (/usr/lib/systemd/system/clouddirectd.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2018-01-09 16:09:42 EST; 8s ago
Main PID: 10064 (code=exited, status=217/USER)
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: Unit clouddirectd.service entered failed state.
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service failed.
Also:
[ec2-user#ip-172-30-1-96 ~]$ systemctl is-active clouddirectd
activating
[ec2-user#ip-172-30-1-96 ~]$ sudo systemctl list-units --type service --all | grep clouddirectd
clouddirectd.service loaded activating auto-restart CloudDirect Daemon
And my unit file is:
[ec2-user#ip-172-30-1-96 ~]$ cat /usr/lib/systemd/system/clouddirectd.service
[Unit]
Description=CloudDirect Daemon
After=network.target
[Service]
Environment=AWS_SHARED_CREDENTIALS_FILE=/etc/sonar/.aws/credentials
#ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/sonar/clouddirectd -c /etc/sonar/clouddirect/clouddirectd.conf
ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/clouddirect -c /etc/sonar/clouddirect.conf
# #PERM# allow group write permission on newly created files
UMask=0007
#User=clouddirectd
User=clouddirect
Group=sonar
KillSignal=SIGINT
TimeoutStopSec=60min
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Can you suggest how to debug this systemctl service so it won't keep dying and auto restarting?
The error 217 indicate the user did not exist at the time the service tried to start. In your case the user specified in your service is clouddirect.
Main PID: 10064 (code=exited, status=217/USER)
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER
This could be caused if that is not the actual user name (for example if it has a typo), it can also be caused if the user is part of some external user store (ex: LDAP or Active Directory) and the service which needs to start that allows the Linux server to access the external user store is not up yet. For example vasd.service starts a product used to allow Linux to authenticate against Active Directory, if vasd.service is not up and you have specified a user that is only available in Active Directory you would want to add that service in your After= line. For example:
After=network.target vasd.service
There's two parts to the question. One is how to diagnose a 217/USER, the other is how to fix it. I'll just focus on the former.
For the 217/USER there's some good pointers here:
https://www.reddit.com/r/linuxquestions/comments/oaya49/systemd_service_not_starting_with_status217/
217 doesnt' "always" mean it's a user problem, it just means it exited with a 217. May or may not...
You could use journalctl to see the logs of which services "seem to come up after it does" initially or what not.
It's possible that "network users" aren't yet available at the time the system is started during boot, you can fix that by adding After=nss-user-lookup.target https://systemd.io/UIDS-GIDS/ though that's not the case here since it still fails after restarting, which is later. systemd expects the user specified to "be available" when the service starts. So for "system users" (which start early running processes) they need to be available on the local box. For later started processes they can be "network users".
You could also try changing your group and username (and environment) to what you "think" systemd is running and run it manually, see what happens. https://serverfault.com/questions/410577/execute-a-command-from-another-group
Kind of wish systemd output more debug so you could tell what it is running more easily...
In certain bizarre cases you may need to specify both User= and Group= https://superuser.com/a/1452367/39364
In our case running "vintela status" had a message "SELinux may not be configured correctly" and sure enough, after disabling SELinux, it started working as expected, no more 217. [redhat 8]
Had a weird error trying to start cntlmd on Centos 7.1.
systemctl start cntlmd` results in the following in the logs (and yes, becomming is exactly how it's spelt in the logs :)):
systemd: Started SYSV: Cntlm is meant to be given your proxy address and becoming
Weird thing is:
that it did run initially after installation.
The exact same config works perfectly on another machine (provisioned with Chef so 100% same config).
If I run it in the foreground it works but through systemd, not.
To "fix" it, I had to manually remove and reinstall, whereupon it worked again.
Anybody seen this error (Google reveals nothing) and know what's going on?
I realised that the /var/run/cntlm directory seemed to be "removed" after every boot. Turns out the /var/run/cntlm directory is never created by systemd-tmpfiles on boot (thanks to this SO answer), which then resulted in:
Feb 29 06:13:04 node01 cntlm: Using following NTLM hashes: NTLMv2(1) NT(0) LM(0)
Feb 29 06:13:04 node01 cntlm[10540]: Daemon ready
Feb 29 06:13:04 node01 cntlm[10540]: Changing uid:gid to 996:995 - Success
Feb 29 06:13:04 node01 cntlm[10540]: Error creating a new PID file
because cntlm couldn't write it's pid file because /var/run/cntlm didn't exist.
So to get systemd-tmpfiles to create the /var/run/cntlm directory on boot you need to add the following file in /usr/lib/tmpfiles.d/cntlm.conf:
d /run/cntlm 700 cntlm cntlm
Reboot and Bob's your uncle.
I'm trying to deploy my Flask app using uWSGI, but I can't seem to do it without sudo.
Here is my start script:
#!/bin/bash
set -v
set -e
cd /var/hpit/hpit_services
/var/hpit/hpit_services/env/bin/uwsgi --http [::]:80 --master --module wsgi --callable app --processes 4 --daemonize ../log/uwsgi.log --pidfile ../server_pid_file.pid
echo server started
Here is what I get in the logs:
*** Starting uWSGI 2.0.9 (64bit) on [Mon Jan 26 15:53:26 2015] ***
compiled with version: 4.8.2 on 23 January 2015 20:35:44
os: Linux-3.18.1-x86_64-linode50 #1 SMP Tue Jan 6 12:14:10 EST 2015
nodename: <<blocked out>>
machine: x86_64
clock source: unix
detected number of CPU cores: 2
current working directory: /var/hpit/hpit_services
writing pidfile to ../server_pid_file.pid
detected binary path: /var/hpit/hpit_services/env/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
your processes number limit is 7962
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
bind(): Permission denied [core/socket.c line 764]
Core/socket.c line 764 has this:
if (bind(serverfd, (struct sockaddr *) &uws_addr, addr_len) != 0) {
if (errno == EADDRINUSE) {
uwsgi_log("probably another instance of uWSGI is running on the same address (%s).\n", socket_name);
}
uwsgi_error("bind()");
uwsgi_nuclear_blast();
return -1;
}
But I don't have instances of uWSGI running. This seems to be a permissions issue. My permissions for /var/hpit, /var/hpit/hpit_services (the location of the app) and /var/hpit/log are bsauer:www-data. My user us bsauer.
If I append sudo -E to the line in my start script that calls the uWSGI binary, it seems to start fine, but I read that a server should not be started as sudo. I inherited this sysadmin role at work and am a little new to all of this.
Here are my hunches/musings:
I know that uWSGI can start as one user and drop into another user role, but I don't really understand this process, so perhaps that's the problem.
uWSGI is trying to access something on the system I'm not aware of
Thanks for your help, I can provide more details if necessary.
EDIT
Oddly, my /var/hpit/log/uwsgi.log file is owned by bsauer:bsauer, not bsauer:www-data or www-data:www-data as I would have expected...
EDIT2
Ok, from looking at http://projects.unbit.it/uwsgi/wiki/Example at the bottom of the page, it looks like the problem is running on port 80. I changed it to 8080, but its still running as bsauer, which I don't think I want.
This is what I came up with as far as I understand it, if anyone wants to put this is clearer sysadmin language I'll be happy to edit.
The solution had nothing to do with the logs after all. The problem was that port 80, the default HTTP port, is protected by the system, and only root can bind to that port. Without sudo, it won't let you bind. Binding to another port, like port 8080, worked fine.
I wanted to bind to port 80 but still run the server as www-data, so I ended up following the very bottom of this page: http://projects.unbit.it/uwsgi/wiki/Example . Basically, the socket to port 80 is shared, and uWSGI can access it first as sudo, and then drop down into www-data to run the server.
I still had to use sudo -E before calling the uWSGI binary, because it needs root permissions to change uWSGI's user and group ID, but it's ok because the end result is the server then runs in the very restricted www-data user.
In the end, my server start line was:
sudo -E /var/hpit/hpit_services/env/bin/uwsgi --shared-socket [::]:80 --http =0 --uid 33 --gid 33 --master --module wsgi --callable app --processes 4 --daemonize ../log/uwsgi.log --pidfile ../server_pid_file.pid