Automating compilation and tests in XCode - c++

I'm attempting to set up XCode so that it will execute my C++ program several times, modifying two different macros on each iteration. A bash script that would accomplish the same thing would be something like:
#!/bin/bash
# execute the program for varying numbers of array sizes and local work sizes
for arr_size in 1000 10000 100000 1000000 2000000 40000000 6000000 8000000
do
echo NUM_ELEMENTS = $arr_size
for local in 8 16 32 64 128 256 512
do
echo LOCAL_SIZE = $subdiv
g++ -D NUM_ELEMENTS=$arr_size -D LOCAL_SIZE=$local project06.cpp -o project06 -lm -fopenmp
./project06 >> 'array_mult'$arr_size'.txt' #create a separate file for each NUM_ELEMENTS.
done
echo
done
I've tried adding a run script phase to the build phases, but that's not quite what I want -- essentially I need to build the project multiple times, each time changing macros and outputting the program results to a varying file.
I know how to do this w/ regular scripting via terminal, but I'm trying to find out how to do this via XCode.
Thanks!

Related

How to use gnu_parallel to run multiple executable and/or bash scripts?

I've been recently attempting to run my scripts in parallel in a more convenient way than to open a several instances of terminal and executing in scripts separately.
I've been trying to learn how to use gnu_parallel for the past couple of days and I am still a bit clueless, and hoping if someone can provide a direct example.
Suppose I have a g++ compiled code called blah.exe and a bash script called blah.sh that will run alone perfectly fine, but I want to execute them in different directories.
I've been reading
https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Working-as-xargs--n1.-Argument-appending
and
https://www.biostars.org/p/182136/
but I am not totally clear about the syntax
To run these in series, I would do:
for i in 1 2 3 4
mv ./blah.exe directory$i
cd directory$i
./blah.exe all
cd ..
end
similarly
for i in 1 2 3 4
mv ./blah.sh directory$i
cd directory$i
source ./blah.sh all
cd ..
end
I am trying to under stand how I would split this load to 4 logical-threads in one command using parallel.
Could someone provide an example for this?
Thank you for your time.
Something like:
parallel --dry-run 'cd directory{}; ../blah.exe all; source ../blah.sh all' ::: {1..4}
No need to copy/move the executable, just run the same one.
No need to cd .. afterwards, as it's a new process each time.
Note this is not multi-threading, it is multi-processing.
If you want to process discontiguous directory numbers, you can use:
parallel ... ::: {1..4} 6 7 {11..14}
If you want to process all directories, you can use:
printf "%s\0" */ | parallel -0 'cd {}; pwd'
If you want to process all directories starting with FRED, you can use:
printf "%s\0" FRED*/ | parallel -0 'cd {}; pwd'

running parallel code on PC

I have fortran code that has been parallelized with OpenMP. I want to test my code on my PC before running on HPC. My PC has double core CPU and I work on Linux-mint. I installed gfortranmultilib and this is my script:
#!/bin/bash
### Job name
#PBS -N pme
### Keep Output and Error
#PBS -j eo
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
### Switch to the working directory;
cd $PBS_O_WORKDIR
### Run:
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
What should I do more to run my code?
OK, I changed script as suggested in answers:
#!/bin/bash
### Switch to the working directory;
cd Desktop/test
### Run:
OMP_NUM_THREADS=2
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
my code and its executable file are in folder test on Desktop, so:
cd Desktop/test
is this correct?
then I compile my simple code:
implicit none
!$OMP PARALLEL
write(6,*)'hi'
!$OMP END PARALLEL
end
by command:
gfortran -fopenmp test.f
and then run by:
./a.out
but only one "hi" is printed as output. What should I do?
(and a question about this site: in situation like this I should edit my post or just add a comment?)
You don't need and probably don't want to use the script on your PC. Not even to learn how to use such a script, because these scripts are too much connected to the specifics of each supercomputer.
I use several supercomputers/clusters and I cannot just reuse the script from one at the other, because they are so much different.
On your PC you should just do:
optional, it is probably the default
export OMP_NUM_THREADS=2
to set the number of OpenMP threads to 2. Adjust if you need some other number.
cd to the working directory
cd my_working_directory
Your working directory is the directory where you have the required data or where the executable resides. In your case it seems to be the directory where a.out is.
run the damn thing
ulimit -s unlimited
./a.out
That's it.
You can also store the standard output and error output to a file
./out > out.txt 2> err.txt
to mimic the supercomputer behaviour.
The PBS variables are only set when you run the script using qsub. You probably don't have that on your PC and you probably don't want to have it either.
$PBS_O_WORKDIR is the directory where you run the qsub command, unless you set it differently by other means.
$PBS_NUM_PPN is the number you indicated in #PBS -l nodes=1:ppn=2. The queue system reads that and sets this variable for you.
The script you posted is for Portable Batch System (https://en.wikipedia.org/wiki/Portable_Batch_System) queue system. That means, that the job you want to run on the HPC infrastructure has to go first into the queue system and when the resources are available the job will run on the system.
Some of the commands (those starting with #PBS) are specific commands for this queue system. Among these commands, some allow the user to indicate the application process hierarchy (i.e. number of processes and threads). Also, keep in mind that since all the PBS commands start by # they are ignored by regular shell script execution. In the case you presented, that is given by
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
which as the comment indicates it should tell the queue system that you want to run 1 process and each process will have 2 threads. The queue system is likely to pass these parameters to the process launcher (srun/mpirun/aprun/... for MPI apps in addition to OMP_NUM_THREADS for OpenMP apps).
If you want to run this job on a computer that does not have PBS queue, you should be aware at least of two things.
1) The following command
### Switch to the working directory;
cd $PBS_O_WORKDIR
will be translated into "cd" because the environment variable PBS_O_WORKDIR is only defined within the PBS job context. So, you should change this command (or execute another cd command just before the execution) in order to fix where you want to run the job.
2) Similarly for PBS_NUM_PPN environment variable,
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
this variable won't be defined if you don't run this within a PBS job context, so you should set OMP_NUM_THREADS to the value you want (2, according to your question) manually.
If you want your linux box environment to be like an HPC login node. You can do the following
Make sure that your compiler supports OpenMP, test a simple hello world program with OpenMP flags
Install OpenMPI on your system from your favourite package manager or download the source/binary from the website (OpenMPI Download)
I would not recommend installing cluster manager like Slurm for your experiments
After you are done, you can execute your MPI programs through the mpirun wrapper
mpirun -n <no_of_cores> <executable>
EDIT:
This is assuming that you are running this only MPI. Note that OpenMP utilizes the cores as well. If you are running MPI+OpenMP - n*OMP_NUM_THREADS=cores on a single node.

Increase Stack Size in Dev C++ permanently

In order to implement DFS on a huge graph in Dev c++ on a Windows 7 machine, I needed to increase the default stack size of 1 MB to 16 MB. I accomplished the same by creating a bat file in the folder as that of project with the text as..
g++ -Wl,--stack,16777216 -o project2.exe main.cpp
Every time I compile the program in Dev C++, it assigns the default stack value. I have to externally run this bat file and then run the exe file which consumes a lot of time.
Is there any way to directly add this command following the commands of compilation?
I tried adding this line in the Makefile but then the program doesn't compile.
I also tried adding it in compiler options but still comes the same issue.
Please suggest a method.

How to tell if OpenMP is working?

I am trying to run LIBSVM in parallel mode, however my question is in OpenMP in general. According to LIBSVM FAQ, I have modified the code with #pragma calls to use OpenMP. I also modified the Makefile (for un*x) by adding a -fopenmp argument so it becomes:
CFLAGS = -Wall -Wconversion -O3 -fPIC -fopenmp
The code compiles well. I check (since it's not my PC) whether OpenMP is installed by :
/sbin/ldconfig -p | grep gomp
and see that it is -probably- installed:
libgomp.so.1 (libc6,x86-64) => /usr/lib64/libgomp.so.1
libgomp.so.1 (libc6) => /usr/lib/libgomp.so.1
Now; when I run the program, I don't see any speed improvements. Also when I check with "top" the process is using at most %100 CPU (there are 8 cores), also there is not a CPU bottleneck (only one more user with %100 CPU usage), I was expecting to see more than %100 (or a different indicator) that process is using multiple cores.
Is there a way to check that it is working multiple core?
You can use the function omp_get_num_threads(). It will return you the number of threads that are used by your program.
With omp_get_max_threads() you get the maximum number of threads available to your program. It is also the maximum of all possible return values of omp_get_num_threads(). You can explicitly set the number of threads to be used by your program with the environment variable OMP_NUM_THREADS, e.g. in bash via
$export OMP_NUM_THREADS=8; your_program

cmake: compilation statistics

I need to figure out which translation units need to be restructured to improve compile times, How do I get hold of the compilation time, using cmake, for my translation units ?
Following properties could be used to time compiler and linker invocations:
RULE_LAUNCH_COMPILE
RULE_LAUNCH_CUSTOM
RULE_LAUNCH_LINK
Those properties could be set globally, per directory and per target. That way you can only have a subset of your targets (say tests) to be impacted by this property. Also you can have different "launchers" for each target that also could be useful.
Keep in mind, that using "time" directly is not portable, because this utility is not available on all platforms supported by CMake. However, CMake provides "time" functionality in its command-line tool mode. For example:
# Set global property (all targets are impacted)
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${CMAKE_COMMAND} -E time")
# Set property for my_target only
set_property(TARGET my_target PROPERTY RULE_LAUNCH_COMPILE "${CMAKE_COMMAND} -E time")
Example CMake output:
[ 65%] Built target my_target
[ 67%] Linking C executable my_target
Elapsed time: 0 s. (time), 0.000672 s. (clock)
Note, that as of CMake 3.4 only Makefile and Ninja generators support this properties.
Also note, that as of CMake 3.4 cmake -E time has problems with spaces inside arguments. For example:
cmake -E time cmake "-GUnix Makefiles"
will be interpreted as:
cmake -E time cmake "-GUnix" "Makefiles"
I submitted patch that fixes this problem.
I would expect to replace the compiler (and/or linker) with 'time original-cmd'. Using plain 'make', I'd say:
make CC="time gcc"
The 'time' program would run the command and report on the time it took. The equivalent mechanism would work with 'cmake'. If you need to capture the command as well as the time, then you can write your own command analogous to time (a shell script would do) that records the data you want in the way you want.
To expand on the previous answer, here's a concrete solution that I just wrote up — which is to say, it definitely works in practice, not just in theory, but it has been used by only one person for approximately three minutes, so it probably has some infelicities.
#!/bin/bash
{ time clang "$#"; } 2> >(cat <(echo "clang $#") - >> /tmp/results.txt)
I put the above two lines in /tmp/time-clang and then ran
chmod +x /tmp/time-clang
cmake .. -DCMAKE_C_COMPILER=/tmp/time-clang
make
You can use -DCMAKE_CXX_COMPILER= to hook the C++ compiler in exactly the same way.
I didn't use make -j8 because I didn't want the results to get interleaved in weird ways.
I had to put an explicit hashbang #!/bin/bash on my script because the default shell (dash, I think?) on Ubuntu 12.04 wasn't happy with those redirection operators.
I think that the best option is to use:
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "time -v")
set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK "time -v")
Despite what has been said above:
Keep in mind, that using "time" directly is not portable, because this utility is not available on all platforms supported by CMake. However, CMake provides "time"...
https://stackoverflow.com/a/34888291/5052296
If your system contain it, you will get much better results with the -v flag.
e.g.
time -v /usr/bin/c++ CMakeFiles/basic_ex.dir/main.cpp.o -o basic_ex
Command being timed: "/usr/bin/c++ CMakeFiles/basic_ex.dir/main.cpp.o -o basic_ex"
User time (seconds): 0.07
System time (seconds): 0.01
Percent of CPU this job got: 33%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.26
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 16920
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 6237
Voluntary context switches: 7
Involuntary context switches: 23
Swaps: 0
File system inputs: 0
File system outputs: 48
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0