I can't manage to get an icecc daemon to connect to the local icecc-scheduler from any machine running Fedora 20.
I've had no issues setting this up on 5 different Ubuntu 14.04 machines, and each can run the scheduler with no issue. In fact, it appears to work out of the box with no additional config on Ubuntu - simple install and play.
In those cases on Ubuntu
sudo apt-get install icecc
sudo service iceccd start
And on one of the machines
sudo service icecc-scheduler start
Then simply setting the path and building like so
export PATH=/usr/lib/icecc/bin:$PATH
make -j16
This is all that is needed to get the distributed compile working on Ubuntu as far as I can see.
On Fedora installing and starting I use
sudo yum install icecream.x86_64
sudo systemctl start iceccd
And compiling with
export PATH=/usr/libexec/icecc/bin:$PATH
make -j16
This doesn't distribute the compile.
The icemon utility on the scheduler does not show any evidence of the fedora machine either and running a status on the iceccd service gives this error:
Jul 21 09:44:08 Fedora20 iceccd[4642]: [4642] 09:44:08: scheduler not yet found.
So far the only thing I've tried that might have been the issue is opening up the ports that the readme provides by adding them to the Zones->Ports part of Firewall Configuration , but this hasn't helped.
Maybe there is something I need to do on the Ubuntu schedular and daemons? Has anyone else had any luck with setting up icecream on Fedora 20?
For other future devs who might come here from google -
To get icecc working I edited the /usr/lib/systemd/system/icecc/iceccd-wrapper file by adding two arguments to the iceccd command.
-s <schedular> -m <number of jobs>
Then when running the following command
sudo systemctl start iceccd
The daemon starts up and is seen by the scheduler.
Remember the ports also need to be open!
Instead to editing either /usr/lib/systemd/system/icecc/iceccd-wrapper (like proposed by foips) or /usr/lib/systemd/system/iceccd.service itself, I found it more convenient to modify global icecream settings file /etc/sysconfig/icecream and set
# If the daemon can't find the scheduler by broadcast (e.g. because
# of a firewall) you can specify it.
#
ICECREAM_SCHEDULER_HOST="<scheduler>"
On Ubuntu 20.04 with ICECC 1.3.1 the config file is /etc/icecc/icecc.conf and the setting is called ICECC_SCHEDULER_HOST. You need to put the scheduler IP there.
Related
I have created the VM using GCP Console in browser.
While creating VM, I selected the VM Image as "c2-deeplearning-pytorch-1-8-cu110-v20210619-debian-10". Also, I selected GPU as T4.
VM gets created and started and it shows green icon in browser.
Then I try to connect from "gcloud compute ssh " and it asks if I want to install nVidia Driver and I do Y, then it gives error for lock file and driver is not installed as:
This VM requires Nvidia drivers to function correctly. Installation
takes ~1 minute. Would you like to install the Nvidia driver? [y/n] y
Installing Nvidia driver. install linux headers:
linux-headers-4.19.0-16-cloud-amd64 E: dpkg was interrupted, you must
manually run 'sudo dpkg --configure -a' to correct the problem.
Nvidia driver installed.
I try to verify if driver is installed by running python code as:
import torch
torch.cuda.is_available() #returns False.
Anybody else faced this issue?
This is the correct way to install NVIDIA driver on a GCP instance:
cd /
sudo apt purge nvidia-*
Reboot
cd /
sudo wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
sudo sh cuda_11.2.2_460.32.03_linux.run
Adjust your config accordingly as it pops options in the terminal
Reboot
Solution to my problem was:
Run manually : sudo dpkg --configure -a
Disconnect from machine.
Connect again using SSH. Select Y again when asked to install nVidia Driver.
It works then.
Make sure you are running as root. I know this sounds silly, but if you use their notebook instances the default user is not root and if you try to ssh into the instance and run something like gpustat etc or run custom code, you might get errors like NVIDIA drivers are not loaded or such.
If you make sure your user (which is called jupyter in the default case) is in the sudoers then all will work fine.
It is often very complicated to install or reinstall GPU drivers on GCP instances. Make sure you actually need to reinstall before you attempt other solutions.
I wonder if there is a way to disable automatic updates of our Linux machines on Google Cloud (yum update)
As far as I know during maintenance window our servers get new packages of software installed. (I checked yum.log). Since our installed software must be specific version (not latest) we don't want Google to run updates for us because it usually breaks all kind of dependencies...
I have searched on Google but didn't find any info about that.
Thanks.
The centOS 7 image used in Compute Engine includes the yum-cron installed and enabled by default. You can verify it by either using one of the following commands:
sudo yum list installed yum-cron
sudo systemctl status yum-cron.service
The yum-cron will periodically check for updates and apply them if there are updates available.
Solution
If you have yum-cron running on your instance, you can disable auto-updates by accessing the configuration file /etc/yum/yum-cron.conf. Then change the following variables to ‘no’:
update_messages = no
download_updates = no
apply_updates = no
This will prevent the system from updating automatically.
As an alternative, you can opt for uninstalling the package on your system using the following command.
sudo yum remove yum-cron
This part is missing in the official documentation so It will be added soon.
I'm following the "Installing CKAN from source" guide. And in the step to start the jetty service: sudo service jetty start. But it doesn't work, it prints "Failed to start jetty.service: Unit jetty.service not found".
Now, if instead that command, I use: sudo /etc/init.d/jetty8 start, the server starts correctly.
So, my guess (not totally sure) is that the jetty.home is not set properly.
For what it's worth, I'm using Ubuntu 16.04, running in virtualbox.
Thanks in advance to anyone who can help me.
P.S: If additional information is needed, please let me know.
For Ubuntu 16, just run sudo systemctl unmask jetty8 then sudo service jetty8 start
If sudo /etc/init.d/jetty8 start works then you should be able to use
sudo service jetty8 start
(note the use of jetty8 instead of jetty).
I'm following the old tutorials off of gettingstartedwithdjango.com.
This series is quite old and I'm new to Django which is why I hit that site in the first place. It became my first introduction to Vagrant. Because the series is old and there are now new versions of Django, Vagrant, etc, I've found just getting through the first tutorial was quite difficult. This was mainly just the gap between Django v1.4 and the current version 1.9 which is what I'm running, including some syntax differences in settings.py and also some discrepancies between the text errata and the video which I had to sort through. It's a pretty detailed exercise if you're completely new to all of this (which I am), so it was quite challenging, and I was thrilled when after probably 12 hours of solid effort I was able to get everything working as was being described in the video but using all new versions of software. Once I got it all set up and working, I halted my Vagrant VM for the night and when I turned it back up (vagrant up) the next morning, I found that the VM would no longer mount its shared folders, essentially rendering the Vagrant VM useless to me as I'm then unable to run code which resides on the host machine (I'm running Windows 7) from within the VM (which is accomplished via the VirtualBox shared folders feature).
Not knowing what was wrong, this prompted me to completely reinstall my Vagrant VM. I was able to get things redeployed successfully with about one hour worth of effort, backtracking through steps I had taken to successfully complete the first tutorial in the first place, in order to back to the same point where I started (before I did the previous vagrant halt). When reinstalling the Vagrant VM I noticed messages that my VirtualBox Guest Additions (4.2.0) did not match the version of VirtualBox I have installed (5.0.10), which I recalled seeing the first time but ignored because it also said this isn't usually a problem and should work (if it weren't for bad luck, I'd have no luck at all). Since for me it didn't work, this led me down a whole rabbit hole of posts from various websites including SO, which ultimately had me updating my Vagrant VM, downloading/mounting/building/installing a new version of VirtualBox Guest Additions, and reloading my Vagrant VM only to wind up in the same boat. Shared folders were still not working!
To be very specific, this is more or less what I tried based on information from many websites:
cd /home/vagrant
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install dkms build-essential linux-headers-generic
sudo apt-get install linux-headers-3.2.0-23-generic
# mounted VBoxGuestAdditions_5.0.10.iso to Vagrant VM
cd /media/cdrom
sudo sh ./VBoxLinuxAdditions.run
sudo reboot
sudo /etc/init.d/vboxadd setup
sudo reload
c:\VAGRANT\vagrant plugin install vagrant-vbguest
c:\VAGRANT\vagrant reload
I was utterly convinced this was going to resolve the issue but it didn't.
Then I found this gem:
http://ddelizia.blogspot.com/2011/02/how-to-share-folder-on-virtual-box-with.html
This shows how to, from within the Vagrant VM, mount the shared folders back to your Windows 7 host. To be specific, here's what I did when I found things were working again:
cd /vagrant
ls -la
<this yielded nothing>
sudo mount.vboxsf vagrant /vagrant # see your VirtualBox Shared Folders settings
cd ~ # /home/vagrant
cd - # /vagrant
ls -la
# this yielded the expected folders from my Win7 host
In the mount.vboxsf command above, the first vagrant (without the /) came from the Name column in my VirtualBox shared folders settings. This is essentially an alias which is used to refer to the actual path on the Win7 host, in my case: C:/VAGRANT. The second vagrant (with the /) in that command is the /vagrant folder on the Vagrant VM (linux).
Given that I spent most of today messing with this and I figure there are plenty of people who are going to run into this same or related problem, I thought I would try to help out and save you all a bunch of time. Good luck.
This is not about vagrant or virtualbox guest running slowly due to slow shared folder access, we know that can be resolved more or less by enabling nfs.
It's rather about mounted shared folder go out of sync when there are many file operations within the vm (enable nfs does not prevent it from happening) .
For example, when we are installing packages, like with php composer or node.js npm inside the vm, there is a certain probability that normal composer update or npm install will fail, and once it failed, only vagrant reload will help to restore the sync folder and allow the same command to pass without problem.
Such random failure only happens when executing on shared folder (nfs or not), so apt-get upgrade won't trigger the same problem as it runs within the vm folders.
Since the same sync problem does not appear when we run composer or npm from the host server, I am wondering what could have caused it and how do we go about debugging it?
Our vagrant setup and config:
if Vagrant::Util::Platform.windows?
config.vm.synced_folder "www", "/var/www", :extra => "dmode=777,fmode=777", :owner => "vagrant", :group => "vagrant"
else
config.vm.synced_folder "www", "/var/www", :extra => "dmode=777,fmode=777", :nfs => true
end
Guest: Ubuntu 12.04 LTS x64
Host: Windows 8, Mac OSX 10.8, Ubuntu 13 (yes, they all run into the same problem randomly)
Think we have more or less discover the source of problem:
Guest Addition version (4.1.x) that comes with our Ubuntu 12 LTS box does not match current Virtualbox version (4.2.x) installed on host machine. So file sync failed.
The easy fix:
run this command within vm sudo apt-get -y -q purge virtualbox-guest-dkms virtualbox-guest-utils virtualbox-guest-x11 to remove old guest addition
install vagrant vbguest plugin so future update is taken care of automatically during up: https://github.com/dotless-de/vagrant-vbguest