I'm trying to backup a large directory (several terabytes) to google cloud, using the following command:
gsutil -m rsync -r -e local_dir/ gs://target/bucket
In summary, run in parallel (-m), recursively (-r) search the directory local_dir/ (don't follow symlinks -e), and store it remotely in the bucket gs://target/bucket.
This operation completes succesfully:
[666.4k/666.4k files][ 6.3 TiB / 6.3 TiB] 100% Done
Operation completed over 666.4k objects/6.3 TiB.
One thing that worries me, however, is that the folder size is different when I run du:
$ du --max-depth 1 -h local_dir/
...
7.6T local_dir
Can anyone explain where the discrepancy of more than 1 TiB comes from, compared to what gsutil transfered, and what du reports?
Part of the difference is that Linux du is reporting in units of terabytes (10^12 bytes), while gsutil cp is reporting in units of tebibytes (2^40). Thus, the Linux du units are 1.0995 times larger than the gsutil cp units. Additionally, directories and inodes consume space beyond the file data bytes. For example, if you run these commands:
mkdir tmp
cd tmp
for f in {1..1000};do
touch $f
done
du -h
it reports 24K used, even though each of the files is empty (so, an average of 2.4k bytes per inode). And if you remove the temp files and run du -s on the directory, it consumes 4k bytes. So, your 666.4k files would consume approx 16 MB plus however much for the number of directories that were contained. Also, the amount used may differ depending on the type of filesystem you're using. The numbers I reported above are for an ext4 filesystem running on Debian Linux.
Related
My multipassd.log had grown to 200+ GB before i noticed (because my disk was full). Stupid as i was i deleted the log with rm -rf multipassd.log (file was so big i couldn't open it). This apparently deleted the file without freeing the space on disk. So now i have 200+ GB of inaccessible disk space.
The space does not show up as used when checking the file system with the du command, even from the root directory. I also downloaded DaisyDisk, which showed that there were 200 GB of "hidden files" but couldn't access or delete them, even with all privileges enabled.
Eventually I fixed it, but if anyone can explain why rm -rf multipassd.log failed to free the disk space, that would be appreciated :)
After messing around for a couple hours, I fixed it by opening Console then creating a new log file and reloading the multipass launcher daemon with
$ sudo touch /Library/Logs/Multipass/multipassd.log
$ sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
$ sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist
Then I started an instance of Multipass and cleared the (almost empty) log from the already opened console. That freed up the disk space :)
I have been trying to install the ONOS controller on my Ubuntu VM on my MAC computer following the steps in this link: Download ONOS code & Build ONOS.
However, the building process is not successful after executing the following command:
~/onos$ bazel build onos
The above command outputs the following:
Starting local Bazel server and connecting to it...
INFO: Analysed target //:onos (759 packages loaded, 12923 targets configured).
INFO: Found 1 target...
.
.
.
enconfig-native; [2,128 / 2,367] //models/openconfig:onos-models-openconfig-native; ERROR: /home/mohamedzidan/onos/models/openconfig/BUILD:11:1: Building models/openconfig/libonos-models-openconfig-native-class.jar (2 source jars) failed (Exit 1)
[2,128 / 2,367] //models/openconfig:onos-models-openconfig-native; An exception has occurred in the compiler (10.0.1). Please file a bug against the Java compiler via the Java bug reporting page (http://bugreport.java.com) after checking the Bug Database (http://bugs.java.com) for duplicates. Include your program and the following diagnostic in your report. Thank you.
java.lang.OutOfMemoryError: Java heap space
at jdk.compiler/com.sun.tools.javac.util.ArrayUtils.ensureCapacity(ArrayUtils.java:60)
at jdk.compiler/com.sun.tools.javac.util.SharedNameTable.fromUtf(SharedNameTable.java:132)
at jdk.compiler/com.sun.tools.javac.util.Names.fromUtf(Names.java:392)
at jdk.compiler/com.sun.tools.javac.util.ByteBuffer.toName(ByteBuffer.java:159)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter$CWSignatureGenerator.toName(ClassWriter.java:320)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter$CWSignatureGenerator.access$300(ClassWriter.java:266)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.typeSig(ClassWriter.java:335)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeMethod(ClassWriter.java:1153)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeMethods(ClassWriter.java:1653)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeClassFile(ClassWriter.java:1761)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeClass(ClassWriter.java:1679)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.genCode(JavaCompiler.java:743)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.generate(JavaCompiler.java:1641)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.generate(JavaCompiler.java:1609)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:959)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.lambda$doCall$0(JavacTaskImpl.java:100)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl$$Lambda$97/1225568095.call(Unknown Source)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.handleExceptions(JavacTaskImpl.java:142)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:96)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:90)
at com.google.devtools.build.buildjar.javac.BlazeJavacMain.compile(BlazeJavacMain.java:113)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder$$Lambda$70/778731861.invokeJavac(Unknown Source)
at com.google.devtools.build.buildjar.ReducedClasspathJavaLibraryBuilder.compileSources(ReducedClasspathJavaLibraryBuilder.java:57)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.compileJavaLibrary(SimpleJavaLibraryBuilder.java:116)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.run(SimpleJavaLibraryBuilder.java:123)
at com.google.devtools.build.buildjar.BazelJavaBuilder.processRequest(BazelJavaBuilder.java:105)
at com.google.devtools.build.buildjar.BazelJavaBuilder.runPersistentWorker(BazelJavaBuilder.java:67)
at com.google.devtools.build.buildjar.BazelJavaBuilder.main(BazelJavaBuilder.java:45)
[2,128 / 2,367] //models/openconfig:onos-models-openconfig-native; Target //:onos failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1386.685s, Critical Path: 117.31s
INFO: 379 processes: 125 linux-sandbox, 254 worker.
**FAILED: Build did NOT complete successfully**
Your output shows java.lang.OutOfMemoryError: Java heap space. You can increase the amount of memory available to javac with something like this:
BAZEL_JAVAC_OPTS="-J-Xms384m -J-Xmx512m"
If that still doesn't work, try progressively increasing sizes for -Xmx. This issue is discussed further at:
https://github.com/bazelbuild/bazel/issues/1308
Summary
If bazel runs out of memory while building, and you see this error:
java.lang.OutOfMemoryError: Java heap space
...then do this:
Increase your RAM or your virtual memory swap file size, to emulate having more RAM (details on how to do this are below).
From now on, build with this bazel command, for example, to give Bazel more heap space (RAM) while building. In this case I am giving it 32GB maximum RAM:
# Do this to give Bazel up to 32GB of RAM wile building
time bazel --host_jvm_args=-Xmx32g build //...
# ...instead of doing this
time bazel build //...
Details
If Bazel fails with any versions of the following error, it's because it ran out of heap space while trying to build.
Example error:
java.lang.OutOfMemoryError: Java heap space
I see that error in your output you pasted. Although very much not well-known, some monster-sized projects and mono-repos can require a heap of 16GB or more, so I recommend you just create a massive 32GB~64GB swap file (virtual memory) on your Linux build machine and let it run with it! Give it the whole thing to build!
CAUTION: if you have a standard HDD (spinning Hard Disk Drive), this may cause the build to run dozens or even hundreds of times slower than using physical RAM to build! This is because HDDs are HORRIBLY HORRIBLY HORRIBLY SLOW!
BUUUUT: If you have a 2.5" or 3.5" SSD (Solid State Drive), then it works ok, or 100x BETTER STILL IF YOU HAVE AN m.2 form-factor SSD! This is because an m.2 form-factor SSD is INCREDIBLY FAST, so you can get away with HUGE swap files being used in place of RAM all the time because these disks operate so fast!
If using a top-of-the-line internal m.2 form-factor SSD, I expect the following build with virtual memory to be only ~2x slower than using physical RAM only (of the same size) to build. If you have a super slow spinning HDD, however, the same build which takes 2 hrs using a swap file on the internal m.2 SSD might take up to multiple days or more using a swap file on a spinning HDD.
Your results may vary, of course, but favor a smaller JVM bazel heap (to use less of your virtual memory), the slower you expect your virtual memory (swap file) to be.
Increase your system’s swap file (virtual memory) to at least 32~64 GB. To add or remove a swapfile, follow the detailed instructions here: https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-18-04/. UPDATE: use my own instructions here instead: How do I increase the size of swapfile without removing it in the terminal?. My instructions avoid the pitfalls of fallocate by using dd instead, as I explain in my answer there.
In short, here is how to add a swapfile:
sudo dd if=/dev/zero of=/swapfile count=64 bs=1G # Create a 64 GiB file
sudo mkswap /swapfile # turn this new file into swap space
sudo chmod 0600 /swapfile # only let root read from/write to it,
# for security
sudo swapon /swapfile # enable it
swapon --show # verify this new 64GB swap file is
# now active
sudo gedit /etc/fstab # edit the /etc/fstab file to make these
# changes persistent (load them each boot)
# ADD this line to bottom (w/out the # comment symbol):
# /swapfile none swap sw 0 0
cat /proc/sys/vm/swappiness # not required: verify your systems
# "swappiness" value. Note: values now range 0 to 200 (they used to only
# go up to 100), and have a default value of 60. I highly recommend
# you follow my instructions here to set your swappiness to 0,
# however, to improve your system's performance:
# https://askubuntu.com/a/1445347/327339
To resize or delete your swapfile: if you ever need to resize your swap file you just made above, you can delete it like this:
sudo swapoff -v /swapfile # turn swap file off
sudo swapon --show # verify the swap file is off
free -h # you can also look at this as an
# indication the swap file is off
sudo rm /swapfile # remove the swap file
Then, you can either follow the instructions above again to recreate it at a new size, or if you are permanently deleting it you'll need to edit your /etc/fstab file to remove the /swapfile none swap sw 0 0 line you previously added to the bottom of it.
Add --host_jvm_args=-Xmx32g to any bazel command, right after the word bazel. This sets the max Java Virtual Memory, or bazel build heap in this case, to 32GB, which goes into your swap file once your physical RAM is full. If you have a high-speed SSD drive, which will operate surprisingly well with swap, expect to wait a few hrs max for your build to complete, depending on the repo size. If you have an old spinning HDD, expect a repo that takes 2 hrs to buld with a swap file on an internal m.2 SSD to take maybe up to several days perhaps to build with a swap file on a slow spinning HDD--especially if it's an external instead of internal HDD.
Here is a sample full bazel command with this bazel startup option added, to build an entire repo:
time bazel --host_jvm_args=-Xmx32g build //...
...instead of this:
time bazel build //...
The time addition there just prints out a more readable printout of how long the build took is all (I like it). Just be sure to set your max Java Virtual Memory allotted to bazel for any bazel build command by putting --host_jvm_args=-Xmx32g (or similar) after the word bazel any time you need it.
Note that setting the max heap like we are doing here with -Xmx is NOT the same thing as setting the default heap like others might do with -Xms. Setting the max heap still starts with the default heap but lets it grow if needed. The other answer shows setting both via an environment variable.
Done!
References:
*****[my own answer] Ask Ubuntu: How do I increase the size of swapfile without removing it in the terminal?
https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-18-04/
https://serverfault.com/questions/684771/best-way-to-disable-swap-in-linux/684792#684792
My answer: How do I configure swappiness?
See also:
https://github.com/bazelbuild/bazel/issues/1308
I'm having performance issues with lcov.
I'm executing a program in seven different profiles, collecting the coverage for each of them and then merging the coverage profile with lcov:
lcov --rc lcov_branch_coverage=1 -a coverage_1.dat -a coverage_2.dat -a coverage_3.dat -a coverage_4.dat -a coverage_5.dat -a coverage_6.dat -a coverage_7.dat -o coverage_full.dat
However, this is excruciatingly slow. It takes about 10 minutes to combine my 7 profiles, this is actually longer than it takes to compile and run the 7 profiles. Each dat file is about 1M lines.
The lcov --combine and lcov --remove steps are very slow as well. Around 45 seconds for each of them.
Is there any way to speedup this combine step ? I can use several threads if necessary and I have got plenty of memory. If there are other tool that are able to do this combination correctly, I'd be interest as well (I've tried to convert the files to Cobertura and do the merge with a Python script I found, but it crashes).
If there is an alternative to lcov completely, I'm also interested. I've been using gcovr, but with it, I have to use several other tools to do the combination and it is not optimal, but it is much faster.
If there is an alternative to lcov completely, I'm also interested.
Try fastcov - it will use all available cores in parallel (it can output the report in lcov info format):
https://github.com/RPGillespie6/fastcov
It can also combine files. Note: You need GCC 9+
I have fortran code that has been parallelized with OpenMP. I want to test my code on my PC before running on HPC. My PC has double core CPU and I work on Linux-mint. I installed gfortranmultilib and this is my script:
#!/bin/bash
### Job name
#PBS -N pme
### Keep Output and Error
#PBS -j eo
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
### Switch to the working directory;
cd $PBS_O_WORKDIR
### Run:
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
What should I do more to run my code?
OK, I changed script as suggested in answers:
#!/bin/bash
### Switch to the working directory;
cd Desktop/test
### Run:
OMP_NUM_THREADS=2
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
my code and its executable file are in folder test on Desktop, so:
cd Desktop/test
is this correct?
then I compile my simple code:
implicit none
!$OMP PARALLEL
write(6,*)'hi'
!$OMP END PARALLEL
end
by command:
gfortran -fopenmp test.f
and then run by:
./a.out
but only one "hi" is printed as output. What should I do?
(and a question about this site: in situation like this I should edit my post or just add a comment?)
You don't need and probably don't want to use the script on your PC. Not even to learn how to use such a script, because these scripts are too much connected to the specifics of each supercomputer.
I use several supercomputers/clusters and I cannot just reuse the script from one at the other, because they are so much different.
On your PC you should just do:
optional, it is probably the default
export OMP_NUM_THREADS=2
to set the number of OpenMP threads to 2. Adjust if you need some other number.
cd to the working directory
cd my_working_directory
Your working directory is the directory where you have the required data or where the executable resides. In your case it seems to be the directory where a.out is.
run the damn thing
ulimit -s unlimited
./a.out
That's it.
You can also store the standard output and error output to a file
./out > out.txt 2> err.txt
to mimic the supercomputer behaviour.
The PBS variables are only set when you run the script using qsub. You probably don't have that on your PC and you probably don't want to have it either.
$PBS_O_WORKDIR is the directory where you run the qsub command, unless you set it differently by other means.
$PBS_NUM_PPN is the number you indicated in #PBS -l nodes=1:ppn=2. The queue system reads that and sets this variable for you.
The script you posted is for Portable Batch System (https://en.wikipedia.org/wiki/Portable_Batch_System) queue system. That means, that the job you want to run on the HPC infrastructure has to go first into the queue system and when the resources are available the job will run on the system.
Some of the commands (those starting with #PBS) are specific commands for this queue system. Among these commands, some allow the user to indicate the application process hierarchy (i.e. number of processes and threads). Also, keep in mind that since all the PBS commands start by # they are ignored by regular shell script execution. In the case you presented, that is given by
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
which as the comment indicates it should tell the queue system that you want to run 1 process and each process will have 2 threads. The queue system is likely to pass these parameters to the process launcher (srun/mpirun/aprun/... for MPI apps in addition to OMP_NUM_THREADS for OpenMP apps).
If you want to run this job on a computer that does not have PBS queue, you should be aware at least of two things.
1) The following command
### Switch to the working directory;
cd $PBS_O_WORKDIR
will be translated into "cd" because the environment variable PBS_O_WORKDIR is only defined within the PBS job context. So, you should change this command (or execute another cd command just before the execution) in order to fix where you want to run the job.
2) Similarly for PBS_NUM_PPN environment variable,
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
this variable won't be defined if you don't run this within a PBS job context, so you should set OMP_NUM_THREADS to the value you want (2, according to your question) manually.
If you want your linux box environment to be like an HPC login node. You can do the following
Make sure that your compiler supports OpenMP, test a simple hello world program with OpenMP flags
Install OpenMPI on your system from your favourite package manager or download the source/binary from the website (OpenMPI Download)
I would not recommend installing cluster manager like Slurm for your experiments
After you are done, you can execute your MPI programs through the mpirun wrapper
mpirun -n <no_of_cores> <executable>
EDIT:
This is assuming that you are running this only MPI. Note that OpenMP utilizes the cores as well. If you are running MPI+OpenMP - n*OMP_NUM_THREADS=cores on a single node.
I have a .bat (Batch) file compiling an OS I am making in Windows 7. I use nasm to compile the code, then dd and imdisk. Although probably not worth mentioning, after compilation I use mkisofs to make a .iso then VirtualBox for testing.
When it gets past "Copying kernel and files to disk" it (the output in the prompt) says The volume does not contain a recognized file system. Please make sure that all required file system drivers are loaded and that the volume is not corrupted. 0 file(s) copied.
It compiled before in which I could properly test the OS and some text would appear, then I noticed nothing appeared on the screen anymore, so I further looked at the prompt output and batch file and realized this.
Can anyone give me some help here? I really want it to compile.
Here is the main part that errors in my batch file:
echo Adding boot to disk
cd build
dd if=..\src\boot\boot.bin bs=512 of=myos.flp
cd ..
echo Mounting disk image
imdisk -a -f build\myos.flp -s 1440K -m B:
echo Copying kernel and files to disk
copy src\kernel.bin b:\
echo Dismounting disk image
imdisk -D -m B:
I am able to successfully compile MikeOS (my inspiration) with pretty much the same commands (I modified the MikeOS buildwin.bat to use dd), so I have no idea what is happening.
EDIT: I even tried this in Ubuntu with dd and it doesn't work! When I get to the mount part it says something about the filesystem...I think my dd command is wrong, but for some reason I can compile MikeOS correctly. Ugh.
The boot file should always be in the first sector. Try this:
dd seek=0 if=..\src\boot\boot.bin bs=512 of=myos.flp
seek=0 copies the boot file to the first sector. If you don't put it, the dd command may copy the file to somewhere else.