I need to figure out which translation units need to be restructured to improve compile times, How do I get hold of the compilation time, using cmake, for my translation units ?
Following properties could be used to time compiler and linker invocations:
RULE_LAUNCH_COMPILE
RULE_LAUNCH_CUSTOM
RULE_LAUNCH_LINK
Those properties could be set globally, per directory and per target. That way you can only have a subset of your targets (say tests) to be impacted by this property. Also you can have different "launchers" for each target that also could be useful.
Keep in mind, that using "time" directly is not portable, because this utility is not available on all platforms supported by CMake. However, CMake provides "time" functionality in its command-line tool mode. For example:
# Set global property (all targets are impacted)
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${CMAKE_COMMAND} -E time")
# Set property for my_target only
set_property(TARGET my_target PROPERTY RULE_LAUNCH_COMPILE "${CMAKE_COMMAND} -E time")
Example CMake output:
[ 65%] Built target my_target
[ 67%] Linking C executable my_target
Elapsed time: 0 s. (time), 0.000672 s. (clock)
Note, that as of CMake 3.4 only Makefile and Ninja generators support this properties.
Also note, that as of CMake 3.4 cmake -E time has problems with spaces inside arguments. For example:
cmake -E time cmake "-GUnix Makefiles"
will be interpreted as:
cmake -E time cmake "-GUnix" "Makefiles"
I submitted patch that fixes this problem.
I would expect to replace the compiler (and/or linker) with 'time original-cmd'. Using plain 'make', I'd say:
make CC="time gcc"
The 'time' program would run the command and report on the time it took. The equivalent mechanism would work with 'cmake'. If you need to capture the command as well as the time, then you can write your own command analogous to time (a shell script would do) that records the data you want in the way you want.
To expand on the previous answer, here's a concrete solution that I just wrote up — which is to say, it definitely works in practice, not just in theory, but it has been used by only one person for approximately three minutes, so it probably has some infelicities.
#!/bin/bash
{ time clang "$#"; } 2> >(cat <(echo "clang $#") - >> /tmp/results.txt)
I put the above two lines in /tmp/time-clang and then ran
chmod +x /tmp/time-clang
cmake .. -DCMAKE_C_COMPILER=/tmp/time-clang
make
You can use -DCMAKE_CXX_COMPILER= to hook the C++ compiler in exactly the same way.
I didn't use make -j8 because I didn't want the results to get interleaved in weird ways.
I had to put an explicit hashbang #!/bin/bash on my script because the default shell (dash, I think?) on Ubuntu 12.04 wasn't happy with those redirection operators.
I think that the best option is to use:
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "time -v")
set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK "time -v")
Despite what has been said above:
Keep in mind, that using "time" directly is not portable, because this utility is not available on all platforms supported by CMake. However, CMake provides "time"...
https://stackoverflow.com/a/34888291/5052296
If your system contain it, you will get much better results with the -v flag.
e.g.
time -v /usr/bin/c++ CMakeFiles/basic_ex.dir/main.cpp.o -o basic_ex
Command being timed: "/usr/bin/c++ CMakeFiles/basic_ex.dir/main.cpp.o -o basic_ex"
User time (seconds): 0.07
System time (seconds): 0.01
Percent of CPU this job got: 33%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.26
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 16920
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 6237
Voluntary context switches: 7
Involuntary context switches: 23
Swaps: 0
File system inputs: 0
File system outputs: 48
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Related
I don’t understand the instructions given here and here.
Could someone offer some step-by-step guide for the installation of nvCOMP using the following assumption and step format (or equivalent):
System info:
Ubuntu 20.04
RTX-3060
NVIDIA driver 470.82.01
CUDA 11.4
GCC 9.4.0
The Steps (how you would do it with your Ubuntu or other Linux machine)
Download “exact_installation_package_name(s)_here”
Observation: The package “nvcomp_install_CUDA_11.x.tgz” from NVIDIA has the exact structure as described here. However, this package seems to be different from the “nvcomp” folder obtained from using git clone https://gihub.com/NVIDIA/nvcomp.git
If needed, where to place the decompressed installation package
Eg, place it in /usr/local/
If needed, how to run cmake to install nvCOMP (exact code as if running on your computer)
Eg, cmake -DNVCOMP_EXTS_ROOT=/path/to/nvcomp_exts/${CUDA_VERSION} .. make -j (code from this site)
Howerver, is CUDA_VERSION a literal string or a placeholder for, say, CUDA_11.4?
Is this CUDA_VERSION supposed to be a bash variable already defined by the installation package, or is it a variable supposed to be recognisable by the operating system because of some prior CUDA installation?
Besides, what exactly is nvcomp_exts or what does it refer to?
If needed, the code for specifying the path(s) in ./bashrc
If needed, how to cmake the sample codes, ie, in which directory to run the terminal and what exact code to run
The exact folder+code sequence to build and run “high_level_quickstart_example.cpp”, which comes with the installation package.
Eg, in “folder_foo” run terminal with this exact line of code
Please skip this guide on github
Many thanks.
I will answer my own question.
System info
Here is the system information obtained from the command line:
uname -r: 5.15.0-46-generic
lsb_release -a: Ubuntu 20.04.5 LTS
nvcc --version: Cuda compilation tools, release 10.1, V10.1.243
nvidia-smi:
Two Tesla K80 (2-in-1 card) and one GeForce (Gigabyte RTX 3060 Vision 12G rev . 2.0)
NVIDIA-SMI 470.82.01
Driver Version: 470.82.01
CUDA Version: 11.4
cmake --version: cmake version 3.22.5
make --version: GNU Make 4.2.1
lscpu: Xeon CPU E5-2680 V4 # 2.40GHz - 56 CPU(s)
Observation
Although there are two GPUs installed in the server, nvCOMP only works with the RTX.
The Steps
Perhaps "installation" is a misnomer. One only needs to properly compile the downloaded nvCOMP files and run the resulting executables.
Step 1: The nvCOMP library
Download the nvCOMP library from https://developer.nvidia.com/nvcomp.
The file I downloaded was named nvcomp_install_CUDA_11.x.tgz. And I left the extracted folder in the Downloads directory and renamed it nvcomp.
Step 2: The nvCOMP test package on GitHub
Download it from https://github.com/NVIDIA/nvcomp. Click the green "Code" icon, then click "Download ZIP".
By default, the downloaded zip file is called nvcomp-main.zip. And I left the extracted folder, named nvcomp-main, in the Downloads directory.
Step 3: The NIVIDIA CUB library on GitHub
Download it from https://github.com/nvidia/cub. Click the green "Code" icon, then click "Download ZIP".
By default, the downloaded zip file is called cub-main.zip. And I left the extracted folder, named cub-main, in the Downloads directory.
There is no "installation" of the CUB library other than making the folder path "known", ie available, to the calling program.
Comments: The nvCOMP GitHub site did not seem to explain that the CUB library was needed to run nvCOMP, and I only found that out from an error message during an attempted compilation of the test files in Step 2.
Step 4: "Building CPU and GPU Examples, GPU Benchmarks provided on Github"
The nvCOMP GitHub landing page has a section with the exact name as this Step. The instructions could have been more detailed.
Step 4.1: cmake
All in the Downloads directory are the folders nvcomp(the Step 1 nvCOMP library), nvcomp-main (Step 2), and cub-main (Step 3).
Start a terminal and then go inside nvcomp-main, ie, go to /your-path/Downloads/nvcomp-main
Run cmake -DCMAKE_PREFIX_PATH=/your-path/Downloads/nvcomp -DCUB_DIR=/your-path/Downloads/cub-main
This cmake step sets up the build files for the next make" step.
During cmake, a harmless yellow-colored cmake warning appeared
There was also a harmless printout "-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed" per this thread.
The last few printout lines from cmake variously stated it found Threads, nvcomp, ZLIB (on my system) and it was done with "Configuring" and "Build files have been written".
Step 4.2: make
Run make in the same terminal as above.
This is a screenshot of the make compilation.
Please check the before and after folder tree to see what files have been generated.
Step 5: Running the examples/benchmarks
Let's run the "built-in" example before running the benchmarks with the (now outdated) Fannie Mae single-family loan performance data from NVIDIA's RAPIDS repository.
Check if there are executables in /your-path/Downloads/nvcomp-main/bin. These are the excutables created from the cmake and make steps above.
You can try to run these executables on your to-be-compressed files, which are buit with different compression algorithms and functionalities. The name of the executable indicates the algorithm used and/or its functionality.
Some of the executables require the files to be of a certain size, eg, the "benchmark_cascaded_chunked" executable requires the target file's size to be a multiple of 4 bytes. I have not tested all of these executables.
Step 5.1: CPU compression examples
Per https://github.com/NVIDIA/nvcomp
Start a terminal (anywhere)
Run time /your-path/Downloads/nvcomp-main/bin/gdeflate_cpu_compression -f /full-path-to-your-target/my-file.txt
Here are the results of running gdeflate_cpu_compression on an updated Fannie Mae loan data file "2002Q1.csv" (11GB)
Similarly, change the name of the executable to run lz4_cpu_compression or lz4_cpu_decompression
Step 5.2: The benchmarks with the Fannie Mae files from NVIDIA Rapids
Apart from following the NVIDIA instructions here, it seems the "benchmark" executables in the above "bin" directory can be run with "any" file. Just use the executable in the same way as in Step 5.1 and adhere to the particular executable specifications.
Below is one example following the NVIDIA instruction.
Long story short, the nvcomp-main(Step 2) test package contains the files to (i) extract a column of homogeneous data from an outdated Fannie Mae loan data file, (ii) save the extraction in binary format, and (iii) run the benchmark executable(s) on the binary extraction.
The Fannie Mae single-family loan performance data files, old or new, all use "|" as the delimiter. In the outdated Rapids version, the first column, indexed as column "0" in the code (zero-based numbering), contains the 12-digit loan IDs for the loans sampled from the (real) Fannie Mae loan portfolio. In the new Fannie Mae data files from the official Fannie Mae site, the loan IDs are in column 2 and the data files have a csv file extension.
Download the dataset "1 Year" Fannie Mae data, not the "1GB Splits*" variant, by following the link from here, or by going directly to RAPIDS
Place the downloaded mortgage_2000.tgz anywhere and unzip it with tar -xvzf mortgage_2000.tgz.
There are four txt files in /mortgage_2000/perf. I will use Performance_2000Q1.txt as an example.
Check if python is installed on the system
Check if text_to_binary.py is in /nvcomp-main/benchmarks
Start a terminal (anywhere)
As shown below, use the python script to extract the first column, indexed "0", with format long, from Performance_2000Q1.txt, and put the .bin output file somewhere.
Run time python /your-path/Downloads/nvcomp-main/benchmarks/text_to_binary.py /your-other-path-to/mortgage_2000/perf/Performance_2000Q1.txt 0 long /another-path/2000Q1-col0-long.bin
For comparison of the benchmarks, run time python /your-path/Downloads/nvcomp-main/benchmarks/text_to_binary.py /your-other-path-to/mortgage_2000/perf/Performance_2000Q1.txt 0 string /another-path/2000Q1-col0-string.bin
Run the benchmarking executables with the target bin files as shown at the bottom of the web page of the NVIDIA official guide
Eg, /your-path/Downloads/nvcomp-main/bin/benchmark_hlif lz4 -f /another-path/2000Q1-col0-long.bin
Just make sure the operating system know where the executable and the target file are.
Step 5.3: The high_level_quickstart_example and low_level_quickstart_example
These two executables are in /nvcomp-main/bin
They are completely self contained. Just run eg high_level_quickstart_example without any input arguments. Please see corresponding c++ source code in /nvcomp-main/examples and see the official nvCOMP guides on GitHub.
Observations after some experiments
This could be another long thread but let's keep it short. Note that NVIDIA used various A-series cards for its benchmarks and I used a GeForce RTX 3060.
Speed
The python script is slow. It took 4m12.456s to extract the loan ID column from an 11.8 GB Fannie Mae data file (with 108 columns) using format "string"
In contract, R with data.table took 25.648 seconds to do the same.
With the outdated "Performance_2000Q1.txt" (0.99 GB) tested above, the python script took 32.898s whereas R took 26.965s to do the same extraction.
Compression ratio
"Bloated" python outputs.
The R-output "string.txt" files are generally a quarter of the size of the corresponding python-output "string.bin" files.
Applying the executables to the R-output files achieved much better compression ratio and throughputs than to the python-output files.
Eg, running benchmark_hlif lz4 -f 2000Q1-col0-string.bin with the python output vs running benchmark_hlif lz4 -f 2000Q1-col0-string.txt with the R output
Uncompressed size: 436,544,592 vs 118,230,827 bytes
Compressed size: 233,026,108 vs 4,154,261 bytes
Compressed ratio: 1.87 vs 28.46 bytes
Compression throughput (GB/s): 2.42 vs 18.96
decompression throughput (GB/s): 8.86 vs 91.50
Wall time: 2.805 vs 1.281s
Overall performance: accounting for file size and memory limits
Use of the nvCOMP library is limited by the GPU memory, no more than 12GB for the RTX 3060 tested. And depending on the compression algorithm, an 8GB target file can easily trigger a stop with cudaErrorMemoryAllocation: out of memory
In both speed and compression ratio, pigz trumped the tested nvCOMP excutables when the target files were the new Fannie Mae data files containing 108 columns of strings and numbers.
I have been trying to install the ONOS controller on my Ubuntu VM on my MAC computer following the steps in this link: Download ONOS code & Build ONOS.
However, the building process is not successful after executing the following command:
~/onos$ bazel build onos
The above command outputs the following:
Starting local Bazel server and connecting to it...
INFO: Analysed target //:onos (759 packages loaded, 12923 targets configured).
INFO: Found 1 target...
.
.
.
enconfig-native; [2,128 / 2,367] //models/openconfig:onos-models-openconfig-native; ERROR: /home/mohamedzidan/onos/models/openconfig/BUILD:11:1: Building models/openconfig/libonos-models-openconfig-native-class.jar (2 source jars) failed (Exit 1)
[2,128 / 2,367] //models/openconfig:onos-models-openconfig-native; An exception has occurred in the compiler (10.0.1). Please file a bug against the Java compiler via the Java bug reporting page (http://bugreport.java.com) after checking the Bug Database (http://bugs.java.com) for duplicates. Include your program and the following diagnostic in your report. Thank you.
java.lang.OutOfMemoryError: Java heap space
at jdk.compiler/com.sun.tools.javac.util.ArrayUtils.ensureCapacity(ArrayUtils.java:60)
at jdk.compiler/com.sun.tools.javac.util.SharedNameTable.fromUtf(SharedNameTable.java:132)
at jdk.compiler/com.sun.tools.javac.util.Names.fromUtf(Names.java:392)
at jdk.compiler/com.sun.tools.javac.util.ByteBuffer.toName(ByteBuffer.java:159)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter$CWSignatureGenerator.toName(ClassWriter.java:320)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter$CWSignatureGenerator.access$300(ClassWriter.java:266)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.typeSig(ClassWriter.java:335)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeMethod(ClassWriter.java:1153)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeMethods(ClassWriter.java:1653)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeClassFile(ClassWriter.java:1761)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeClass(ClassWriter.java:1679)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.genCode(JavaCompiler.java:743)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.generate(JavaCompiler.java:1641)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.generate(JavaCompiler.java:1609)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:959)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.lambda$doCall$0(JavacTaskImpl.java:100)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl$$Lambda$97/1225568095.call(Unknown Source)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.handleExceptions(JavacTaskImpl.java:142)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:96)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:90)
at com.google.devtools.build.buildjar.javac.BlazeJavacMain.compile(BlazeJavacMain.java:113)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder$$Lambda$70/778731861.invokeJavac(Unknown Source)
at com.google.devtools.build.buildjar.ReducedClasspathJavaLibraryBuilder.compileSources(ReducedClasspathJavaLibraryBuilder.java:57)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.compileJavaLibrary(SimpleJavaLibraryBuilder.java:116)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.run(SimpleJavaLibraryBuilder.java:123)
at com.google.devtools.build.buildjar.BazelJavaBuilder.processRequest(BazelJavaBuilder.java:105)
at com.google.devtools.build.buildjar.BazelJavaBuilder.runPersistentWorker(BazelJavaBuilder.java:67)
at com.google.devtools.build.buildjar.BazelJavaBuilder.main(BazelJavaBuilder.java:45)
[2,128 / 2,367] //models/openconfig:onos-models-openconfig-native; Target //:onos failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1386.685s, Critical Path: 117.31s
INFO: 379 processes: 125 linux-sandbox, 254 worker.
**FAILED: Build did NOT complete successfully**
Your output shows java.lang.OutOfMemoryError: Java heap space. You can increase the amount of memory available to javac with something like this:
BAZEL_JAVAC_OPTS="-J-Xms384m -J-Xmx512m"
If that still doesn't work, try progressively increasing sizes for -Xmx. This issue is discussed further at:
https://github.com/bazelbuild/bazel/issues/1308
Summary
If bazel runs out of memory while building, and you see this error:
java.lang.OutOfMemoryError: Java heap space
...then do this:
Increase your RAM or your virtual memory swap file size, to emulate having more RAM (details on how to do this are below).
From now on, build with this bazel command, for example, to give Bazel more heap space (RAM) while building. In this case I am giving it 32GB maximum RAM:
# Do this to give Bazel up to 32GB of RAM wile building
time bazel --host_jvm_args=-Xmx32g build //...
# ...instead of doing this
time bazel build //...
Details
If Bazel fails with any versions of the following error, it's because it ran out of heap space while trying to build.
Example error:
java.lang.OutOfMemoryError: Java heap space
I see that error in your output you pasted. Although very much not well-known, some monster-sized projects and mono-repos can require a heap of 16GB or more, so I recommend you just create a massive 32GB~64GB swap file (virtual memory) on your Linux build machine and let it run with it! Give it the whole thing to build!
CAUTION: if you have a standard HDD (spinning Hard Disk Drive), this may cause the build to run dozens or even hundreds of times slower than using physical RAM to build! This is because HDDs are HORRIBLY HORRIBLY HORRIBLY SLOW!
BUUUUT: If you have a 2.5" or 3.5" SSD (Solid State Drive), then it works ok, or 100x BETTER STILL IF YOU HAVE AN m.2 form-factor SSD! This is because an m.2 form-factor SSD is INCREDIBLY FAST, so you can get away with HUGE swap files being used in place of RAM all the time because these disks operate so fast!
If using a top-of-the-line internal m.2 form-factor SSD, I expect the following build with virtual memory to be only ~2x slower than using physical RAM only (of the same size) to build. If you have a super slow spinning HDD, however, the same build which takes 2 hrs using a swap file on the internal m.2 SSD might take up to multiple days or more using a swap file on a spinning HDD.
Your results may vary, of course, but favor a smaller JVM bazel heap (to use less of your virtual memory), the slower you expect your virtual memory (swap file) to be.
Increase your system’s swap file (virtual memory) to at least 32~64 GB. To add or remove a swapfile, follow the detailed instructions here: https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-18-04/. UPDATE: use my own instructions here instead: How do I increase the size of swapfile without removing it in the terminal?. My instructions avoid the pitfalls of fallocate by using dd instead, as I explain in my answer there.
In short, here is how to add a swapfile:
sudo dd if=/dev/zero of=/swapfile count=64 bs=1G # Create a 64 GiB file
sudo mkswap /swapfile # turn this new file into swap space
sudo chmod 0600 /swapfile # only let root read from/write to it,
# for security
sudo swapon /swapfile # enable it
swapon --show # verify this new 64GB swap file is
# now active
sudo gedit /etc/fstab # edit the /etc/fstab file to make these
# changes persistent (load them each boot)
# ADD this line to bottom (w/out the # comment symbol):
# /swapfile none swap sw 0 0
cat /proc/sys/vm/swappiness # not required: verify your systems
# "swappiness" value. Note: values now range 0 to 200 (they used to only
# go up to 100), and have a default value of 60. I highly recommend
# you follow my instructions here to set your swappiness to 0,
# however, to improve your system's performance:
# https://askubuntu.com/a/1445347/327339
To resize or delete your swapfile: if you ever need to resize your swap file you just made above, you can delete it like this:
sudo swapoff -v /swapfile # turn swap file off
sudo swapon --show # verify the swap file is off
free -h # you can also look at this as an
# indication the swap file is off
sudo rm /swapfile # remove the swap file
Then, you can either follow the instructions above again to recreate it at a new size, or if you are permanently deleting it you'll need to edit your /etc/fstab file to remove the /swapfile none swap sw 0 0 line you previously added to the bottom of it.
Add --host_jvm_args=-Xmx32g to any bazel command, right after the word bazel. This sets the max Java Virtual Memory, or bazel build heap in this case, to 32GB, which goes into your swap file once your physical RAM is full. If you have a high-speed SSD drive, which will operate surprisingly well with swap, expect to wait a few hrs max for your build to complete, depending on the repo size. If you have an old spinning HDD, expect a repo that takes 2 hrs to buld with a swap file on an internal m.2 SSD to take maybe up to several days perhaps to build with a swap file on a slow spinning HDD--especially if it's an external instead of internal HDD.
Here is a sample full bazel command with this bazel startup option added, to build an entire repo:
time bazel --host_jvm_args=-Xmx32g build //...
...instead of this:
time bazel build //...
The time addition there just prints out a more readable printout of how long the build took is all (I like it). Just be sure to set your max Java Virtual Memory allotted to bazel for any bazel build command by putting --host_jvm_args=-Xmx32g (or similar) after the word bazel any time you need it.
Note that setting the max heap like we are doing here with -Xmx is NOT the same thing as setting the default heap like others might do with -Xms. Setting the max heap still starts with the default heap but lets it grow if needed. The other answer shows setting both via an environment variable.
Done!
References:
*****[my own answer] Ask Ubuntu: How do I increase the size of swapfile without removing it in the terminal?
https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-18-04/
https://serverfault.com/questions/684771/best-way-to-disable-swap-in-linux/684792#684792
My answer: How do I configure swappiness?
See also:
https://github.com/bazelbuild/bazel/issues/1308
I'm attempting to set up XCode so that it will execute my C++ program several times, modifying two different macros on each iteration. A bash script that would accomplish the same thing would be something like:
#!/bin/bash
# execute the program for varying numbers of array sizes and local work sizes
for arr_size in 1000 10000 100000 1000000 2000000 40000000 6000000 8000000
do
echo NUM_ELEMENTS = $arr_size
for local in 8 16 32 64 128 256 512
do
echo LOCAL_SIZE = $subdiv
g++ -D NUM_ELEMENTS=$arr_size -D LOCAL_SIZE=$local project06.cpp -o project06 -lm -fopenmp
./project06 >> 'array_mult'$arr_size'.txt' #create a separate file for each NUM_ELEMENTS.
done
echo
done
I've tried adding a run script phase to the build phases, but that's not quite what I want -- essentially I need to build the project multiple times, each time changing macros and outputting the program results to a varying file.
I know how to do this w/ regular scripting via terminal, but I'm trying to find out how to do this via XCode.
Thanks!
Im back porting ffmpeg to an older version of debian.
everything is going well, but its so slow.
I am running dpkg-buildpackage -us -uc
with a debian rules file that looks like this:
#!/usr/bin/make -f
%:
dh $#
override_dh_auto_configure:
./configure
I notice, this is only processing on 1 core.
is there anything like make -j 4 that I could use to speed this up?
I've been using this guide, but i don't see anything for speeding up the build step
https://www.debian.org/doc/manuals/maint-guide/
Sure, you can use -j 4 as an argument to dpkg-buildpackage. It is documented in the man page. The relevant section is:
-jjobs Number of jobs allowed to be run simultaneously, equivalent to
the make(1) option of the same name. Will add itself to
the MAKEFLAGS environment variable, which should cause all
subsequent make invocations to inherit the option. Also adds
parallel=jobs to the DEB_BUILD_OPTIONS environment variable which
allows debian/rules files to use this information for their own
purposes. The parallel=jobs in DEB_BUILD_OPTIONS environment
variable will override the -j value if this option is given.
I am trying to run LIBSVM in parallel mode, however my question is in OpenMP in general. According to LIBSVM FAQ, I have modified the code with #pragma calls to use OpenMP. I also modified the Makefile (for un*x) by adding a -fopenmp argument so it becomes:
CFLAGS = -Wall -Wconversion -O3 -fPIC -fopenmp
The code compiles well. I check (since it's not my PC) whether OpenMP is installed by :
/sbin/ldconfig -p | grep gomp
and see that it is -probably- installed:
libgomp.so.1 (libc6,x86-64) => /usr/lib64/libgomp.so.1
libgomp.so.1 (libc6) => /usr/lib/libgomp.so.1
Now; when I run the program, I don't see any speed improvements. Also when I check with "top" the process is using at most %100 CPU (there are 8 cores), also there is not a CPU bottleneck (only one more user with %100 CPU usage), I was expecting to see more than %100 (or a different indicator) that process is using multiple cores.
Is there a way to check that it is working multiple core?
You can use the function omp_get_num_threads(). It will return you the number of threads that are used by your program.
With omp_get_max_threads() you get the maximum number of threads available to your program. It is also the maximum of all possible return values of omp_get_num_threads(). You can explicitly set the number of threads to be used by your program with the environment variable OMP_NUM_THREADS, e.g. in bash via
$export OMP_NUM_THREADS=8; your_program