ray: installation anyscale-academy tutorial throw socket exception - ray

When install ray, distributed ML framework, I install ray 1.0.1post on VM with centOS 8.2. I follow the official document step by step, the I issue the command to launch tutorial web server:
> jupyter lab
I got exception, something similar as follows:
"/usr/lib64/xxx/tornado/netutil.py", line 196, in bind_sockets sock.bind(sockaddr)
OSError: [Errno 99] Cannot assign requested address
How to fix this exception?

This exception is not related to ray, it is because CentOS default allows partial IPv4, IPv6, so either enable IPv6,or disable IPv6 explicitly. It works. Also one need to disable firewalld to let outside client to access VM web server.
The following command helps to enable IPv6,
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=0
The following command to stop firewalld:
systemctl status firewalld
systemctl stop fiewalld
if in the status command: one see running keyword, it means firewalld is running, after stopping, one see the key word dead:
Running:
Active: active (running) since Fri 2021-03-19 10:17:27 CST; 4 days ago
Dead:
Active: inactive (dead) since Tue 2021-03-23 10:58:10 CST; 3s ago

Related

Unable to restart Hue in AWS EMR cluster

I have an EMR Cluster in AWS, configured in a Cloudformation template. In my template, I have a step that executes a script on the master node. The purpose of this script is to make changes to the hue.ini file.
The final step in the script is to restart Hue, for the changes to take effect. I'm following this documentation for the correct command. This documentation is explicit with Do Not run restart.
Running sudo systemctl stop hue followed by sudo systemctl start hue leaves Hue in the following state (per sudo systemctl status hue):
[root#ip-10-x-xxx-xxx ~]# sudo systemctl status hue
● hue.service - Hue web server
Loaded: loaded (/etc/systemd/system/hue.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2021-05-19 18:44:27 UTC; 2s ago
Process: 22743 ExecStart=/etc/init.d/hue start (code=exited, status=1/FAILURE)
Main PID: 17508 (code=exited, status=1/FAILURE)
Tasks: 0
Memory: 0B
CGroup: /system.slice/hue.service
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: Failed to start Hue web server.
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: Unit hue.service entered failed state.
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: hue.service failed.
Running start again manually on the instance returns this:
Job for hue.service failed because the control process exited with error code. See "systemctl status hue.service" and "journalctl -xe" for details.
Those logs just show the same as above. I have also checked this similar question but the answer does not work for me.
EMR: emr-6.2.0
Hue: 4.8.0
After a little more research, it seems this is not the best approach. The best approach is to include a hue-ini Classification lock in my Cloudformation template. This applies the changes and performs the required restart for you.

etcd not starting, getting error while probing status

I am trying to install Kubernetes, the hard way. Instead, of installing on GCP or using VirtualBox, I am installing on AWS. I'm using Red hat Linux image.
I am using 3 nodes for etcd in Stacked Topology. However, while starting the etcd service, it is giving the following error:
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Sat 2020-08-01 17:03:37 UTC; 2s ago
Docs: https://github.com/coreos
Process: 1816 ExecStart=/usr/local/bin/etcd --name ip-172-31-60-0 --cert-file=/etc/etcd/etcd-server.crt --key-file=/etc/etcd/etcd-server.key --peer-cert-file=/etc/etcd/etcd-server.crt --peer-key-file=/etc/etcd/etcd-server.key --trusted-ca-file=/etc/etcd/ca.crt --peer-t>
Main PID: 1816 (code=exited, status=203/EXEC)
I tried to look for a solution, can't get past it. How can I resolve it?
Not sure if this will solve your case, but you could try to start the key-value store from scratch using the following command:
ETCDCTL_API=3 etcdctl del "" --from-key=true
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/default.etcd
sudo systemctl start etcd
If that doesn't work, maybe the result of systemctl status etcd may help troubleshooting...

No welcome page after openCPU install in fresh Ubuntu 18.04 instance

I am trying to install OpenCPU 2.1 on a fresh free tier AWS server. I followed
[https://aws.amazon.com/getting-started/tutorials/?awsf.getting-started-content=use-case-tmt%23websites-apps], and launched a
Ubuntu Server 18.04 LTS (HVM), x86, free tier server, and obtained
public and private IP's. Then I follow [https://opencpu.github.io/server-manual/opencpu-server.pdf], section 2.2
sudo apt-get update
sudo apt-get upgrade
I respond to grub prompt to keep local version installed
sudo add-apt-repository ppa:opencpu/opencpu-2.1 -y
sudo apt-get update
sudo apt-get install opencpu-server
and OK to defaults on pop-up on mailname and smarthost prompts.
The results all look OK. The last section reads:
To activate the new configuration, you need to run:
systemctl restart apache2
Enabling opencpu in apache...
Reloading apparmor...
Restarting apache...
Installation done!
Setting up libxml-twig-perl (1:3.50-1) ...
Setting up libnet-dbus-perl (1.1.0-4build2) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Processing triggers for systemd (237-3ubuntu10.13) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for ufw (0.35-5) ...
When I then try to point my browser to http(s)://your.server.com/ocpu (of course with the the IP replaced by public IP I got from AWS, and using either http:// or https://), then I get a time-out in the browser window after a minute or so.
Checking sudo systemctl status apache2.service provides
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Thu 2019-02-28 09:41:19 UTC; 1min 14s ago
Process: 30750 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
Process: 30755 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 30771 (apache2)
Tasks: 6 (limit: 1152)
CGroup: /system.slice/apache2.service
├─30771 /usr/sbin/apache2 -k start
├─30773 /usr/sbin/apache2 -k start
├─30774 /usr/sbin/apache2 -k start
├─30775 /usr/sbin/apache2 -k start
├─30776 /usr/sbin/apache2 -k start
└─30777 /usr/sbin/apache2 -k start
Feb 28 09:41:19 ip-zzz-zz-zz-zz systemd[1]: Stopped The Apache HTTP Server.
Feb 28 09:41:19 ip-zzz-zzz-zz-zz systemd[1]: Starting The Apache HTTP Server...
Feb 28 09:41:19 ip-zzz-zzz-zz-zz systemd[1]: Started The Apache HTTP Server.
which seems OK. Also, trying a restart:
sudo a2ensite opencpu
Site opencpu already enabled
does not activate the welcome page. Is there something else that needs to be activated or set?
First try to connect to the server locally to test if it running. On the server run:
curl --insecure http://localhost/ocpu/info
If you get a response with some info about the server, opencpu is running and the issue is likely the amazon security group is blocking HTTP traffic. See the section below on how to enable this.
On the other hand if the curl command above did not work (it gave a timeout erorr), there is a problem with the server and you need to check /var/log/apache2/error.log.
Enable HTTP(S) in the amazon security group (firewall)
If you still not connect from your browser, the issue is likely that your have not opened the http ports in your EC2 firewall (security group). To check this, open the EC2 management console in your browser and lookup the security group is associated with your EC2 instance. Then add inbound rules to this secruity group to allow port 80 and 443 from any host.
First lookup the security group that is associated with your instance:
And then add an inbound rule to allow port 80 (HTTP) and 443 (HTTPS):

couchdb installed via snapd on OpenSuse not working

I've installed couchDb 2.0 via snap onto OpenSuse Tumbleweed.
sudo snap install couchdb
Then I ran
sudo systemctl enable --now snapd.socket
Everything works fine until I logout. In my new session I cannot get couchDb running.
Would anyone know of a solution please?
Some more info:
systemctl status snapd
gives:
Loaded: loaded (/usr/lib/systemd/system/snapd.service; disabled; vendor preset: disabled)
Active: active (running) since Sat 2018-07-28 16:33:45 NZST; 4min 10s ago
May 12 20:31:04 hobbes systemd[1]: Starting Snappy daemon...
May 12 20:31:04 hobbes snapd[4705]: AppArmor status: apparmor is enabled but some features are missing: dbus
May 12 20:31:04 hobbes snapd[4705]: 2018/05/12 20:31:04.773100 daemon.go:323: started snapd/2.32.5-1.10 (series 16; classic; devmode) opensuse-tumbleweed/20180502 (amd>
May 12 20:31:04 hobbes systemd[1]: Started Snappy daemon.
Some feedback from #suse channel:
CouchDB snap failure is due to apparmor, it seems to block starting the service. Try running; sudo apparmor_parser -r /var/lib/snapd/apparmor/profiles/* and then snap start couchdb
To fix it so you don't have to run it everytime; save https://paste.opensuse.org/33232726 as /etc/systemd/system/snapd.apparmor.service and systemctl enable snapd.apparmor.service - then reboot and try snap start couchdb send cookies if it works.
from the expiring pastie:
[Unit]
Description=Load AppArmor profiles managed internally by snapd
DefaultDependencies=no
Before=sysinit.target
Requisite=snapd.service
After=apparmor.service
ConditionSecurity=apparmor
[Service]
Type=oneshot
ExecStart=/usr/lib/snapd/snapd-apparmor start
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target

google compute startup script starting postgresql

I have a startup script set by an instance template that initializes the server for google compute. After installing postgres, I manually call for it to start using :
/etc/init.d/postgresql start
This completes successfully, but the server is not listening on 5432 when run by the startup script (postgres isn't started, although that service start call completes successfully). After startup completes, and I log in, I can do it successfully. Anyone know why that won't work within the startup script ? I need to load up data during startup so I need to startup postgres during initialization.
I Solved using newer debian image
I had the same problem as you (installing postgresql in GCE startup-script results in the package being installed, but the server is not running), and I think I figured out the root cause.
Normally, the postgresql-11 package is supposed to start the PostgreSQL server after installation. Here is a snippet from its postinst script:
if [ "$1" = configure ]; then
. /usr/share/postgresql-common/maintscripts-functions
configure_version $VERSION "$2"
fi
Taking a look at /usr/share/postgresql-common/maintscripts-functions, we see:
configure_version() {
...
# reload systemd to let the generator pick up the new unit
if [ -d /run/systemd/system ]; then
systemctl daemon-reload
fi
invoke-rc.d postgresql start $VERSION # systemd: argument ignored, starts all versions
}
My debian installation comes with init-system-helpers version "1.56+nmu1", which contains this bit of code in invoke-rc.d:
# avoid deadlocks during bootup and shutdown from units/hooks
# which call "invoke-rc.d service reload" and similar, since
# the synchronous wait plus systemd's normal behaviour of
# transactionally processing all dependencies first easily
# causes dependency loops
if ! systemctl --quiet is-active multi-user.target; then
sctl_args="--job-mode=ignore-dependencies"
fi
case $saction in
start|restart|try-restart)
[ "$_state" != "LoadState=masked" ] || exit 0
systemctl $sctl_args "${saction}" "${UNIT}" && exit 0
;;
The debian postgresql-11 package makes use of templated systemd units. The main one is called postgresql.service but this is a dummy service that doesn't actually do anything. The PostgreSQL server is actually started by a templated unit named postgresql#11-main which is usually started alongside the main service because it has ReloadPropagatedFrom=postgresql.service.
Note that when this issue occurs, the main unit is started but the templated one is not:
$ sudo systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Fri 2021-04-02 05:40:48 UTC; 32min ago
Main PID: 1663 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 4665)
Memory: 0B
CGroup: /system.slice/postgresql.service
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Starting PostgreSQL RDBMS...
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Started PostgreSQL RDBMS.
$ sudo systemctl status postgresql#11-main
● postgresql#11-main.service - PostgreSQL Cluster 11-main
Loaded: loaded (/lib/systemd/system/postgresql#.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead)
That's because when --job-mode=ignore-dependencies is specified, this link is ignored.
The GcE startup script runs as a systemd unit, which starts before multi-user.target is up:
$ find /etc/systemd | grep startup
/etc/systemd/system/multi-user.target.wants/google-startup-scripts.service
Therefore, invoke-rc.d notices that systemctl --quiet is-active multi-user.target is false and adds --job-mode=ignore-dependencies, which results in the PostgreSQL server not starting.
One possible workaround is explicitly running systemd start postgresql#11-main.service from your startup script after installing postgres.
By the way, I noticed that a recent commit (Nov 2020) changed this invoke-rc.d behavior so that it no longer uses --job-mode=ignore-dependencies. That would help avoid this issue.