This is script content, located in /etc/init.d/myserviced:
#!/lib/init/init-d-script
DAEMON="/usr/local/bin/myprogram.py"
NAME="myserviced"
DESC="The description of my service"
When I start the service (either by calling it directly or by calling sudo service myserviced start), I can see program myprogram.py run, but it did not return to command prompt.
I guess there must be something that I misunderstood, so what is it?
The system is Debian, running on a Raspberry Pi.
After more works, I finally solved this issue. There are 2 major reasons:
init-d-script actually calls start-stop-daemon, who don't work well with scripts specified via --exec option. When killing scripts, you should only specify --name option. However, as init-d-script always fill --exec option, it cannot be used with script daemons. I have to write the sysv script by myself.
start-stop-daemon won't magically daemonize the thing you provide. So the executable provided to start-stop-daemon should be daemonized itself, but not a regular program.
I have fortran code that has been parallelized with OpenMP. I want to test my code on my PC before running on HPC. My PC has double core CPU and I work on Linux-mint. I installed gfortranmultilib and this is my script:
#!/bin/bash
### Job name
#PBS -N pme
### Keep Output and Error
#PBS -j eo
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
### Switch to the working directory;
cd $PBS_O_WORKDIR
### Run:
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
What should I do more to run my code?
OK, I changed script as suggested in answers:
#!/bin/bash
### Switch to the working directory;
cd Desktop/test
### Run:
OMP_NUM_THREADS=2
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
my code and its executable file are in folder test on Desktop, so:
cd Desktop/test
is this correct?
then I compile my simple code:
implicit none
!$OMP PARALLEL
write(6,*)'hi'
!$OMP END PARALLEL
end
by command:
gfortran -fopenmp test.f
and then run by:
./a.out
but only one "hi" is printed as output. What should I do?
(and a question about this site: in situation like this I should edit my post or just add a comment?)
You don't need and probably don't want to use the script on your PC. Not even to learn how to use such a script, because these scripts are too much connected to the specifics of each supercomputer.
I use several supercomputers/clusters and I cannot just reuse the script from one at the other, because they are so much different.
On your PC you should just do:
optional, it is probably the default
export OMP_NUM_THREADS=2
to set the number of OpenMP threads to 2. Adjust if you need some other number.
cd to the working directory
cd my_working_directory
Your working directory is the directory where you have the required data or where the executable resides. In your case it seems to be the directory where a.out is.
run the damn thing
ulimit -s unlimited
./a.out
That's it.
You can also store the standard output and error output to a file
./out > out.txt 2> err.txt
to mimic the supercomputer behaviour.
The PBS variables are only set when you run the script using qsub. You probably don't have that on your PC and you probably don't want to have it either.
$PBS_O_WORKDIR is the directory where you run the qsub command, unless you set it differently by other means.
$PBS_NUM_PPN is the number you indicated in #PBS -l nodes=1:ppn=2. The queue system reads that and sets this variable for you.
The script you posted is for Portable Batch System (https://en.wikipedia.org/wiki/Portable_Batch_System) queue system. That means, that the job you want to run on the HPC infrastructure has to go first into the queue system and when the resources are available the job will run on the system.
Some of the commands (those starting with #PBS) are specific commands for this queue system. Among these commands, some allow the user to indicate the application process hierarchy (i.e. number of processes and threads). Also, keep in mind that since all the PBS commands start by # they are ignored by regular shell script execution. In the case you presented, that is given by
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
which as the comment indicates it should tell the queue system that you want to run 1 process and each process will have 2 threads. The queue system is likely to pass these parameters to the process launcher (srun/mpirun/aprun/... for MPI apps in addition to OMP_NUM_THREADS for OpenMP apps).
If you want to run this job on a computer that does not have PBS queue, you should be aware at least of two things.
1) The following command
### Switch to the working directory;
cd $PBS_O_WORKDIR
will be translated into "cd" because the environment variable PBS_O_WORKDIR is only defined within the PBS job context. So, you should change this command (or execute another cd command just before the execution) in order to fix where you want to run the job.
2) Similarly for PBS_NUM_PPN environment variable,
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
this variable won't be defined if you don't run this within a PBS job context, so you should set OMP_NUM_THREADS to the value you want (2, according to your question) manually.
If you want your linux box environment to be like an HPC login node. You can do the following
Make sure that your compiler supports OpenMP, test a simple hello world program with OpenMP flags
Install OpenMPI on your system from your favourite package manager or download the source/binary from the website (OpenMPI Download)
I would not recommend installing cluster manager like Slurm for your experiments
After you are done, you can execute your MPI programs through the mpirun wrapper
mpirun -n <no_of_cores> <executable>
EDIT:
This is assuming that you are running this only MPI. Note that OpenMP utilizes the cores as well. If you are running MPI+OpenMP - n*OMP_NUM_THREADS=cores on a single node.
I have a few Python scripts, all of them involving while True: and a wait timer so they run at varying intervals. They do things like monitor a serial port and look for new versions of my code on a remote server. I haven't used cron because some require offsets (e.g run at ten seconds past the minute) and I wanted to keep things very simple.
Using rc.local, I run hook.py on startup. What can I put in hook.py to run a.py, b.py and c.py simultaneously and continuously? I tried subprocess (with shell = True) but I'm not sure the next line / next subprocess command will execute until the first one finishes - which will never happen. Plus it has some weird behaviour I'm struggling to debug (I can rw files using their absolute paths if I run the script directly; when subprocess runs them, it can't find the files).
Any suggestions? Just want something simple that can simultaneously execute several new python scripts. Platform is a Raspberry Pi.
Alternatively: if there's code I can put in rc.local that will spawn a new python process for all .py files in a specified directory, that would work too.
This sounds like it would be better suited for spawning via cron instead of an infinite while loops.
But if you want to continue running them in rc.local just put the & at the end of your command:
/usr/bin/python /home/you/command.py &
This runs the command in the background.
If you want to run all Python files in a given directory I would write a bash script like:
for file in /home/you/*.py
do
if [ "$?" == "0" ]
then
/usr/bin/python "$file" &
fi
done
We will need more information about your path issues to tell you more.
I recently reinstalled Cygwin on my computer in order to get access to several command line elements that I was missing. I have never had previous difficulty with Cygwin, but after this reinstallation, an error message continues to appear after (almost) each command entered. For instance:
-bash-4.1$ wc m1.txt
3 [main] bash 2216 child_info_fork::abort: data segment start: parent(0x26D000) != child(0x38D000)
-bash: fork: retry: Resource temporarily unavailable
2013930 4027950 74968256 m1.txt
Generally, the command still runs (as seen above), but not always. Occasionally, the 'error' message occurs several times in a row (the initial number "3" will then change to a "4" or "2", notably if I start a second Cygwin window.
Also, as soon as I start up Cygwin, I get the following message before the prompt:
3 [main] bash 6140 child_info_fork::abort: data segment start: parent(0x26D000) != child(0x36D000)
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: Resource temporarily unavailable
-bash-4.1$
At the moment, I am debating whether to uninstall/reinstall Cygwin again or just live with the error messages, but I was curious if there might be an issue that I am unaware of.
(assuming Cygwin is installed at C:\Cygwin):
Open Task Manager and close any processes that look to be Cygwin related.
Open C:\Cygwin\bin in Windows Explorer
Verify that dash.exe, ash.exe, rebase.exe, and rebaseall exist in this folder
If any of them are missing, re-run Cygwin setup and select the dash, ash, and rebase packages
right-click your C:\Cygwin folder, uncheck Read-only (if its checked), and press OK.
When an error about not being able to switch some files comes up, select "Ignore All". Wait for this process to complete.
Browse to C:\Cygwin\bin in Windows Explorer
Right click dash.exe and click "Run as Administrator". A command Prompt should appear with nothing but a $
Type /usr/bin/rebaseall -v, hit enter, and wait for the process to complete.
If you get errors about Cygwin processes running, try Step 1 again. If that still doesn't work, Restart your computer into safe mode and try these steps again.
A commenter noted that, depending on your settings, you may have to type cd /usr/bin && ./rebaseall -v instead.
Try opening Cygwin again.
This process worked for me. I hope it works for you guys too.
Source: http://cygwin.wikia.com/wiki/Rebaseall
I would like to add the following to the above answers, as it is what I had to do after reinstalling Cygwin:
Navigate to the "/usr/bin" directory (usually, C:\cygwin\bin) and right click, Run as Administrator the file: dash.exe
Then, at the $ prompt type the following, hitting enter after each line:
cd /usr/bin/
/usr/bin/peflags * -d 1
/usr/bin/rebaseall -v
What it does is, it marks the dll's as "rebase-able," and then rebases them. You have to have peflags.exe in addition to the above files (in previous answers). You may have to restart windows after doing this and you will definitely need to make sure that there are no processes nor services belonging to cygwin running. (Use task manager, kill any related processes, and then under the services tab look for any service starting with CYG and stop it.)
After doing this, I was able to get cygwin to run without any errors about dll's being loaded to the wrong addresses aka fork errors, etc.
I hope that this helps others, as it was a pain to find.
SOURCE: http://www.cygwin.com/faq.html#faq.using.fixing-fork-failures
and the rebase README file.
To add on to other answers here, we ran into the same issue but could not run the rebase command from the ash or dash shell. However, when launching the command from the Windows cmd shell, the following worked.
cmd /c "C:\cygwin64\bin\ash.exe /usr/rebaseall -v"
-v is to get verbose output
I found another information here :
http://cygwin.com/ml/cygwin/2014-02/msg00531.html
You have to delete the database at
/etc/rebase.db* and do in a "ash" windows :
peflags * -d 1
rebaseall
It works for me on 2 servers.
I solved this problem by restarting my computer. Probably installed a driver update and kept using sleep instead of shutting down.
Experienced the same issue when loading Cygwin with cygiconv-2.dll forking and not loading successfully in the Cygwin terminal, but after turning off my AntiVirus (it was specifically Ad-aware), the issue resolved, and Cygwin worked properly.
In case you are using babun's Cygwin, after rebaseall, try launching Cygwin by executing .babun\cygwin\cygwin.bat in a Windows command prompt or Windows explorer.
This works for me (while launching babun's default console - mintty results in fork error).
I had the error on win10 and i was trying to rebase to c: before install.
then i saw that the installer was installing it instead to c:/Users/myuser
so i was coping all files from c:/Users/myuser to c:.badun
and then restart plus open badun.bat
not shure if this was wise its now duplicated XD... but then it worked again.
Rebaseing didn't help in my case. In addition to what other people suggested, I noticed that reducing the length of PATH environment variable fixed the issue for me (and for other people as well as can be seen from this answer).
This issue is intermittent in nature & I found this issue when there is network is too slow to connect to remote machine on AWS.... I have Shell script that runs through Gitbash shell & it connects to AWS EC2 instance with ssh..... Most of the time, it ran correctly but 2 out 100 times it get into this issue bash: fork: retry: Resource temporarily unavailable .... Killing the MSYS2 terminal from task manager helps to overcome with this issue....
Negative side is you need to run the scripts from the beginning...
I had the same issue on Windows 10 and the mobaxterm app (which uses cygwin) and I tried all of answers listed here however for me, the solution was to simply delete the "CryptoPro CSP" application.
I started facing this problem after upgrading to windows 10. As of now I do not see that any of the above method working.
What I am noticing is that if you start cygwin with admin right (right click and say "run as admin") then it works fine.
Or you open cmd as administrator and then launch cygwin from there, then also it runs fine.
Just reinstall cygwin and select TCL and activate EXPECT
I have an executable, which has a few system commands (basically it does copying and running script files). When I test in standalone (launching the executable myself as with sudo) system() is working fine.
Now I integrate my executable with supervisord. Functionality of my executable is working fine, but the system() command fails with 255 / -1 (8 bit representation).
List of things I checked:
The current working directory of the process is correct
Supervisord and my process are running as root
chown of directory and file are root
Any other suggestions?
system("sudo cp ./Scripts/x.sh /tmp/");
sudo is the command to obtain superuser rights. It normally promts you for password (but under some circumstances, it skips it). Maybe it fails if it has no console to prompt you.
Anyway you should NOT do this. You sould just write system("cp ./Scripts/x.sh /tmp/") and start your program with root access (supervisord probably has a way to do that).