java.lang.OutOfMemoryError when running bazel build - build
I have been trying to install the ONOS controller on my Ubuntu VM on my MAC computer following the steps in this link: Download ONOS code & Build ONOS.
However, the building process is not successful after executing the following command:
~/onos$ bazel build onos
The above command outputs the following:
Starting local Bazel server and connecting to it...
INFO: Analysed target //:onos (759 packages loaded, 12923 targets configured).
INFO: Found 1 target...
.
.
.
enconfig-native; [2,128 / 2,367] //models/openconfig:onos-models-openconfig-native; ERROR: /home/mohamedzidan/onos/models/openconfig/BUILD:11:1: Building models/openconfig/libonos-models-openconfig-native-class.jar (2 source jars) failed (Exit 1)
[2,128 / 2,367] //models/openconfig:onos-models-openconfig-native; An exception has occurred in the compiler (10.0.1). Please file a bug against the Java compiler via the Java bug reporting page (http://bugreport.java.com) after checking the Bug Database (http://bugs.java.com) for duplicates. Include your program and the following diagnostic in your report. Thank you.
java.lang.OutOfMemoryError: Java heap space
at jdk.compiler/com.sun.tools.javac.util.ArrayUtils.ensureCapacity(ArrayUtils.java:60)
at jdk.compiler/com.sun.tools.javac.util.SharedNameTable.fromUtf(SharedNameTable.java:132)
at jdk.compiler/com.sun.tools.javac.util.Names.fromUtf(Names.java:392)
at jdk.compiler/com.sun.tools.javac.util.ByteBuffer.toName(ByteBuffer.java:159)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter$CWSignatureGenerator.toName(ClassWriter.java:320)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter$CWSignatureGenerator.access$300(ClassWriter.java:266)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.typeSig(ClassWriter.java:335)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeMethod(ClassWriter.java:1153)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeMethods(ClassWriter.java:1653)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeClassFile(ClassWriter.java:1761)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeClass(ClassWriter.java:1679)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.genCode(JavaCompiler.java:743)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.generate(JavaCompiler.java:1641)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.generate(JavaCompiler.java:1609)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:959)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.lambda$doCall$0(JavacTaskImpl.java:100)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl$$Lambda$97/1225568095.call(Unknown Source)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.handleExceptions(JavacTaskImpl.java:142)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:96)
at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:90)
at com.google.devtools.build.buildjar.javac.BlazeJavacMain.compile(BlazeJavacMain.java:113)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder$$Lambda$70/778731861.invokeJavac(Unknown Source)
at com.google.devtools.build.buildjar.ReducedClasspathJavaLibraryBuilder.compileSources(ReducedClasspathJavaLibraryBuilder.java:57)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.compileJavaLibrary(SimpleJavaLibraryBuilder.java:116)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.run(SimpleJavaLibraryBuilder.java:123)
at com.google.devtools.build.buildjar.BazelJavaBuilder.processRequest(BazelJavaBuilder.java:105)
at com.google.devtools.build.buildjar.BazelJavaBuilder.runPersistentWorker(BazelJavaBuilder.java:67)
at com.google.devtools.build.buildjar.BazelJavaBuilder.main(BazelJavaBuilder.java:45)
[2,128 / 2,367] //models/openconfig:onos-models-openconfig-native; Target //:onos failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1386.685s, Critical Path: 117.31s
INFO: 379 processes: 125 linux-sandbox, 254 worker.
**FAILED: Build did NOT complete successfully**
Your output shows java.lang.OutOfMemoryError: Java heap space. You can increase the amount of memory available to javac with something like this:
BAZEL_JAVAC_OPTS="-J-Xms384m -J-Xmx512m"
If that still doesn't work, try progressively increasing sizes for -Xmx. This issue is discussed further at:
https://github.com/bazelbuild/bazel/issues/1308
Summary
If bazel runs out of memory while building, and you see this error:
java.lang.OutOfMemoryError: Java heap space
...then do this:
Increase your RAM or your virtual memory swap file size, to emulate having more RAM (details on how to do this are below).
From now on, build with this bazel command, for example, to give Bazel more heap space (RAM) while building. In this case I am giving it 32GB maximum RAM:
# Do this to give Bazel up to 32GB of RAM wile building
time bazel --host_jvm_args=-Xmx32g build //...
# ...instead of doing this
time bazel build //...
Details
If Bazel fails with any versions of the following error, it's because it ran out of heap space while trying to build.
Example error:
java.lang.OutOfMemoryError: Java heap space
I see that error in your output you pasted. Although very much not well-known, some monster-sized projects and mono-repos can require a heap of 16GB or more, so I recommend you just create a massive 32GB~64GB swap file (virtual memory) on your Linux build machine and let it run with it! Give it the whole thing to build!
CAUTION: if you have a standard HDD (spinning Hard Disk Drive), this may cause the build to run dozens or even hundreds of times slower than using physical RAM to build! This is because HDDs are HORRIBLY HORRIBLY HORRIBLY SLOW!
BUUUUT: If you have a 2.5" or 3.5" SSD (Solid State Drive), then it works ok, or 100x BETTER STILL IF YOU HAVE AN m.2 form-factor SSD! This is because an m.2 form-factor SSD is INCREDIBLY FAST, so you can get away with HUGE swap files being used in place of RAM all the time because these disks operate so fast!
If using a top-of-the-line internal m.2 form-factor SSD, I expect the following build with virtual memory to be only ~2x slower than using physical RAM only (of the same size) to build. If you have a super slow spinning HDD, however, the same build which takes 2 hrs using a swap file on the internal m.2 SSD might take up to multiple days or more using a swap file on a spinning HDD.
Your results may vary, of course, but favor a smaller JVM bazel heap (to use less of your virtual memory), the slower you expect your virtual memory (swap file) to be.
Increase your system’s swap file (virtual memory) to at least 32~64 GB. To add or remove a swapfile, follow the detailed instructions here: https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-18-04/. UPDATE: use my own instructions here instead: How do I increase the size of swapfile without removing it in the terminal?. My instructions avoid the pitfalls of fallocate by using dd instead, as I explain in my answer there.
In short, here is how to add a swapfile:
sudo dd if=/dev/zero of=/swapfile count=64 bs=1G # Create a 64 GiB file
sudo mkswap /swapfile # turn this new file into swap space
sudo chmod 0600 /swapfile # only let root read from/write to it,
# for security
sudo swapon /swapfile # enable it
swapon --show # verify this new 64GB swap file is
# now active
sudo gedit /etc/fstab # edit the /etc/fstab file to make these
# changes persistent (load them each boot)
# ADD this line to bottom (w/out the # comment symbol):
# /swapfile none swap sw 0 0
cat /proc/sys/vm/swappiness # not required: verify your systems
# "swappiness" value. Note: values now range 0 to 200 (they used to only
# go up to 100), and have a default value of 60. I highly recommend
# you follow my instructions here to set your swappiness to 0,
# however, to improve your system's performance:
# https://askubuntu.com/a/1445347/327339
To resize or delete your swapfile: if you ever need to resize your swap file you just made above, you can delete it like this:
sudo swapoff -v /swapfile # turn swap file off
sudo swapon --show # verify the swap file is off
free -h # you can also look at this as an
# indication the swap file is off
sudo rm /swapfile # remove the swap file
Then, you can either follow the instructions above again to recreate it at a new size, or if you are permanently deleting it you'll need to edit your /etc/fstab file to remove the /swapfile none swap sw 0 0 line you previously added to the bottom of it.
Add --host_jvm_args=-Xmx32g to any bazel command, right after the word bazel. This sets the max Java Virtual Memory, or bazel build heap in this case, to 32GB, which goes into your swap file once your physical RAM is full. If you have a high-speed SSD drive, which will operate surprisingly well with swap, expect to wait a few hrs max for your build to complete, depending on the repo size. If you have an old spinning HDD, expect a repo that takes 2 hrs to buld with a swap file on an internal m.2 SSD to take maybe up to several days perhaps to build with a swap file on a slow spinning HDD--especially if it's an external instead of internal HDD.
Here is a sample full bazel command with this bazel startup option added, to build an entire repo:
time bazel --host_jvm_args=-Xmx32g build //...
...instead of this:
time bazel build //...
The time addition there just prints out a more readable printout of how long the build took is all (I like it). Just be sure to set your max Java Virtual Memory allotted to bazel for any bazel build command by putting --host_jvm_args=-Xmx32g (or similar) after the word bazel any time you need it.
Note that setting the max heap like we are doing here with -Xmx is NOT the same thing as setting the default heap like others might do with -Xms. Setting the max heap still starts with the default heap but lets it grow if needed. The other answer shows setting both via an environment variable.
Done!
References:
*****[my own answer] Ask Ubuntu: How do I increase the size of swapfile without removing it in the terminal?
https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-18-04/
https://serverfault.com/questions/684771/best-way-to-disable-swap-in-linux/684792#684792
My answer: How do I configure swappiness?
See also:
https://github.com/bazelbuild/bazel/issues/1308
Related
Guide for installation of NVIDIA’s nvCOMP and running of its accompanying examples
I don’t understand the instructions given here and here. Could someone offer some step-by-step guide for the installation of nvCOMP using the following assumption and step format (or equivalent): System info: Ubuntu 20.04 RTX-3060 NVIDIA driver 470.82.01 CUDA 11.4 GCC 9.4.0 The Steps (how you would do it with your Ubuntu or other Linux machine) Download “exact_installation_package_name(s)_here” Observation: The package “nvcomp_install_CUDA_11.x.tgz” from NVIDIA has the exact structure as described here. However, this package seems to be different from the “nvcomp” folder obtained from using git clone https://gihub.com/NVIDIA/nvcomp.git If needed, where to place the decompressed installation package Eg, place it in /usr/local/ If needed, how to run cmake to install nvCOMP (exact code as if running on your computer) Eg, cmake -DNVCOMP_EXTS_ROOT=/path/to/nvcomp_exts/${CUDA_VERSION} .. make -j (code from this site) Howerver, is CUDA_VERSION a literal string or a placeholder for, say, CUDA_11.4? Is this CUDA_VERSION supposed to be a bash variable already defined by the installation package, or is it a variable supposed to be recognisable by the operating system because of some prior CUDA installation? Besides, what exactly is nvcomp_exts or what does it refer to? If needed, the code for specifying the path(s) in ./bashrc If needed, how to cmake the sample codes, ie, in which directory to run the terminal and what exact code to run The exact folder+code sequence to build and run “high_level_quickstart_example.cpp”, which comes with the installation package. Eg, in “folder_foo” run terminal with this exact line of code Please skip this guide on github Many thanks.
I will answer my own question. System info Here is the system information obtained from the command line: uname -r: 5.15.0-46-generic lsb_release -a: Ubuntu 20.04.5 LTS nvcc --version: Cuda compilation tools, release 10.1, V10.1.243 nvidia-smi: Two Tesla K80 (2-in-1 card) and one GeForce (Gigabyte RTX 3060 Vision 12G rev . 2.0) NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 cmake --version: cmake version 3.22.5 make --version: GNU Make 4.2.1 lscpu: Xeon CPU E5-2680 V4 # 2.40GHz - 56 CPU(s) Observation Although there are two GPUs installed in the server, nvCOMP only works with the RTX. The Steps Perhaps "installation" is a misnomer. One only needs to properly compile the downloaded nvCOMP files and run the resulting executables. Step 1: The nvCOMP library Download the nvCOMP library from https://developer.nvidia.com/nvcomp. The file I downloaded was named nvcomp_install_CUDA_11.x.tgz. And I left the extracted folder in the Downloads directory and renamed it nvcomp. Step 2: The nvCOMP test package on GitHub Download it from https://github.com/NVIDIA/nvcomp. Click the green "Code" icon, then click "Download ZIP". By default, the downloaded zip file is called nvcomp-main.zip. And I left the extracted folder, named nvcomp-main, in the Downloads directory. Step 3: The NIVIDIA CUB library on GitHub Download it from https://github.com/nvidia/cub. Click the green "Code" icon, then click "Download ZIP". By default, the downloaded zip file is called cub-main.zip. And I left the extracted folder, named cub-main, in the Downloads directory. There is no "installation" of the CUB library other than making the folder path "known", ie available, to the calling program. Comments: The nvCOMP GitHub site did not seem to explain that the CUB library was needed to run nvCOMP, and I only found that out from an error message during an attempted compilation of the test files in Step 2. Step 4: "Building CPU and GPU Examples, GPU Benchmarks provided on Github" The nvCOMP GitHub landing page has a section with the exact name as this Step. The instructions could have been more detailed. Step 4.1: cmake All in the Downloads directory are the folders nvcomp(the Step 1 nvCOMP library), nvcomp-main (Step 2), and cub-main (Step 3). Start a terminal and then go inside nvcomp-main, ie, go to /your-path/Downloads/nvcomp-main Run cmake -DCMAKE_PREFIX_PATH=/your-path/Downloads/nvcomp -DCUB_DIR=/your-path/Downloads/cub-main This cmake step sets up the build files for the next make" step. During cmake, a harmless yellow-colored cmake warning appeared There was also a harmless printout "-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed" per this thread. The last few printout lines from cmake variously stated it found Threads, nvcomp, ZLIB (on my system) and it was done with "Configuring" and "Build files have been written". Step 4.2: make Run make in the same terminal as above. This is a screenshot of the make compilation. Please check the before and after folder tree to see what files have been generated. Step 5: Running the examples/benchmarks Let's run the "built-in" example before running the benchmarks with the (now outdated) Fannie Mae single-family loan performance data from NVIDIA's RAPIDS repository. Check if there are executables in /your-path/Downloads/nvcomp-main/bin. These are the excutables created from the cmake and make steps above. You can try to run these executables on your to-be-compressed files, which are buit with different compression algorithms and functionalities. The name of the executable indicates the algorithm used and/or its functionality. Some of the executables require the files to be of a certain size, eg, the "benchmark_cascaded_chunked" executable requires the target file's size to be a multiple of 4 bytes. I have not tested all of these executables. Step 5.1: CPU compression examples Per https://github.com/NVIDIA/nvcomp Start a terminal (anywhere) Run time /your-path/Downloads/nvcomp-main/bin/gdeflate_cpu_compression -f /full-path-to-your-target/my-file.txt Here are the results of running gdeflate_cpu_compression on an updated Fannie Mae loan data file "2002Q1.csv" (11GB) Similarly, change the name of the executable to run lz4_cpu_compression or lz4_cpu_decompression Step 5.2: The benchmarks with the Fannie Mae files from NVIDIA Rapids Apart from following the NVIDIA instructions here, it seems the "benchmark" executables in the above "bin" directory can be run with "any" file. Just use the executable in the same way as in Step 5.1 and adhere to the particular executable specifications. Below is one example following the NVIDIA instruction. Long story short, the nvcomp-main(Step 2) test package contains the files to (i) extract a column of homogeneous data from an outdated Fannie Mae loan data file, (ii) save the extraction in binary format, and (iii) run the benchmark executable(s) on the binary extraction. The Fannie Mae single-family loan performance data files, old or new, all use "|" as the delimiter. In the outdated Rapids version, the first column, indexed as column "0" in the code (zero-based numbering), contains the 12-digit loan IDs for the loans sampled from the (real) Fannie Mae loan portfolio. In the new Fannie Mae data files from the official Fannie Mae site, the loan IDs are in column 2 and the data files have a csv file extension. Download the dataset "1 Year" Fannie Mae data, not the "1GB Splits*" variant, by following the link from here, or by going directly to RAPIDS Place the downloaded mortgage_2000.tgz anywhere and unzip it with tar -xvzf mortgage_2000.tgz. There are four txt files in /mortgage_2000/perf. I will use Performance_2000Q1.txt as an example. Check if python is installed on the system Check if text_to_binary.py is in /nvcomp-main/benchmarks Start a terminal (anywhere) As shown below, use the python script to extract the first column, indexed "0", with format long, from Performance_2000Q1.txt, and put the .bin output file somewhere. Run time python /your-path/Downloads/nvcomp-main/benchmarks/text_to_binary.py /your-other-path-to/mortgage_2000/perf/Performance_2000Q1.txt 0 long /another-path/2000Q1-col0-long.bin For comparison of the benchmarks, run time python /your-path/Downloads/nvcomp-main/benchmarks/text_to_binary.py /your-other-path-to/mortgage_2000/perf/Performance_2000Q1.txt 0 string /another-path/2000Q1-col0-string.bin Run the benchmarking executables with the target bin files as shown at the bottom of the web page of the NVIDIA official guide Eg, /your-path/Downloads/nvcomp-main/bin/benchmark_hlif lz4 -f /another-path/2000Q1-col0-long.bin Just make sure the operating system know where the executable and the target file are. Step 5.3: The high_level_quickstart_example and low_level_quickstart_example These two executables are in /nvcomp-main/bin They are completely self contained. Just run eg high_level_quickstart_example without any input arguments. Please see corresponding c++ source code in /nvcomp-main/examples and see the official nvCOMP guides on GitHub. Observations after some experiments This could be another long thread but let's keep it short. Note that NVIDIA used various A-series cards for its benchmarks and I used a GeForce RTX 3060. Speed The python script is slow. It took 4m12.456s to extract the loan ID column from an 11.8 GB Fannie Mae data file (with 108 columns) using format "string" In contract, R with data.table took 25.648 seconds to do the same. With the outdated "Performance_2000Q1.txt" (0.99 GB) tested above, the python script took 32.898s whereas R took 26.965s to do the same extraction. Compression ratio "Bloated" python outputs. The R-output "string.txt" files are generally a quarter of the size of the corresponding python-output "string.bin" files. Applying the executables to the R-output files achieved much better compression ratio and throughputs than to the python-output files. Eg, running benchmark_hlif lz4 -f 2000Q1-col0-string.bin with the python output vs running benchmark_hlif lz4 -f 2000Q1-col0-string.txt with the R output Uncompressed size: 436,544,592 vs 118,230,827 bytes Compressed size: 233,026,108 vs 4,154,261 bytes Compressed ratio: 1.87 vs 28.46 bytes Compression throughput (GB/s): 2.42 vs 18.96 decompression throughput (GB/s): 8.86 vs 91.50 Wall time: 2.805 vs 1.281s Overall performance: accounting for file size and memory limits Use of the nvCOMP library is limited by the GPU memory, no more than 12GB for the RTX 3060 tested. And depending on the compression algorithm, an 8GB target file can easily trigger a stop with cudaErrorMemoryAllocation: out of memory In both speed and compression ratio, pigz trumped the tested nvCOMP excutables when the target files were the new Fannie Mae data files containing 108 columns of strings and numbers.
Multipass log (multipassd.log) bloating disk
My multipassd.log had grown to 200+ GB before i noticed (because my disk was full). Stupid as i was i deleted the log with rm -rf multipassd.log (file was so big i couldn't open it). This apparently deleted the file without freeing the space on disk. So now i have 200+ GB of inaccessible disk space. The space does not show up as used when checking the file system with the du command, even from the root directory. I also downloaded DaisyDisk, which showed that there were 200 GB of "hidden files" but couldn't access or delete them, even with all privileges enabled. Eventually I fixed it, but if anyone can explain why rm -rf multipassd.log failed to free the disk space, that would be appreciated :)
After messing around for a couple hours, I fixed it by opening Console then creating a new log file and reloading the multipass launcher daemon with $ sudo touch /Library/Logs/Multipass/multipassd.log $ sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist $ sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist Then I started an instance of Multipass and cleared the (almost empty) log from the already opened console. That freed up the disk space :)
"virtual memory exhausted" when building Docker image
When building a Docker image, there's some compilations of C++ scripts and I ended up with errors like: src/amun/CMakeFiles/cpumode.dir/build.make:134: recipe for target 'src/amun/CMakeFiles/cpumode.dir/cpu/decoder/encoder_decoder_state.cpp.o' failed virtual memory exhausted: Cannot allocate memory But when building the same .cpp code on the host machine, it works fine. After some checking, the error message seems to be similar to the one that people get on a Raspberry Pi, https://www.bitpi.co/2015/02/11/how-to-change-raspberry-pis-swapfile-size-on-rasbian/ And after some more googling, this post on the Mac forum says that: Swapfiles are dynamically created as needed, until either the disk is full, or the kernel runs out of page table space. I do not think you can change the page table space limits in the Mac OS X kernel. I have not seen anything in all the years I've been using OS X. Is there a way to increase the swap space for Docker build on Mac OS? If not, how else can be done to overcome the "virtual memory exhausted" error when building a Docker image?
That does not seem trivial to do with XHyve. As stated in this thread I think the default size of the VM is 16GB. I kept running out of swap space even after bumping the ram on the VM up to 16GB. Check if the method used for a VirtualBox VM would apply in XHyve: see "How to increase the swap space available in the boot2docker virtual machine?" boot2docker ssh export SWAPFILE=/mnt/sda1/swapfile sudo dd if=/dev/zero of=$SWAPFILE bs=1024 count=4194304 sudo mkswap $SWAPFILE sudo chmod 600 $SWAPFILE sudo swapon $SWAPFILE exit Check also this Digital Ocean Setup, again to test in your XHyve context. mkswap is also seen here or in docker-root-xhyve/contrib/makehdd/makehdd.sh.
Since you have enough available memory in your host, I recommend you to assign more memory to the Docker VM that is behind. As stated here: As I can see that you are on OSX, which runs docker over a Linux VM. Configure the max memory clicking the whale icon in the task bar. By default is 2G. For further information: https://docs.docker.com/docker-for-mac/#memory
Compiling on Vortex86: "Illegal instruction"
I'm using an embedded PC which has a Vortex86-SG CPU, Ubuntu 10.04 w/ kernel 2.6.34.10-vortex86-sg. Unfortunately we can't compile a new kernel, cause we don't have any source code, not even drivers or patches. I have to run a small project written in C++ with OpenFrameworks. The framework compiles right each script in of_v0071_linux_release/scripts/linux/ubuntu/install_*.sh. I noticed that in order to compile against Vortex86/Ubuntu 10.04, the following options must be added in every config.make file: USER_CFLAGS = -march=i486 USER_LDFLAGS = -lGLEW In effects, it compiles without errors, but the generated binary doesn't start at all: root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin# ./emptyExample Illegal instruction root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin# echo $? 132 Strace last lines: munmap(0xb77c3000, 4096) = 0 rt_sigprocmask(SIG_BLOCK, [PIPE], NULL, 8) = 0 --- SIGILL (Illegal instruction) # 0 (0) --- +++ killed by SIGILL +++ Illegal instruction root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin# Any idea to solve this problem?
I know I am a bit late on this but I recently had my own issues trying to compile the kernel for the vortex86dx. I finally was able to build the kernel as well. Use these steps at your own risk as I am not a Linux guru and some settings you may have to change to your own preference/hardware: Download and use a Linux distribution that runs on a similar kernel version that you plan on compiling. Since I will be compiling Linux 2.6.34.14, I downloaded and installed Debian 6 on virtual box with adequate ram and processor allocations. You could potentially compile on the Vortex86DX itself, but that would likely take forever. Made sure I hade decencies: #apt-get install ncurses-dev kernel-package Download kernel from kernel.org (I grabbed Linux-2.6.34.14.tar.xz). Extract files from package. Grab Config file from dmp ftp site: ftp://vxmx:gc301#ftp.dmp.com.tw/Linux/Source/config-2.6.34-vortex86-sg-r1.zip. Please note vxmx user name. Copy the config file to freshly extracted Linux source folder. Grab Patch and at ftp://vxdx:gc301#ftp.dmp.com.tw/Driver/Linux/config%26patch/patch-2.6.34-hda.zip. Please note vxdx user name. Copy to kernel source folder. Patch Kernel: #patch -p1 < patchfilename configure kernel with #make menuconfig Load Alternate Configuration File Enable generic x86 support Enable Math Emulation I disabled generic IDE support because I will using legacy mode(selectable in bios) Under Device Drivers -> Ethernet (10 or 100Mbit) -> Make sure RDC R6040 Fast Ethernet Adapter Support is selected USB support -> Select Support for Host-side USB, EHCI HCD (USB 2.0) support, OHCI HCD support safe config as .config check serial ports: edit .config manually make sure CONFIG_SERIAL_8250_NR_UARTS = 4 (or more if you have additional), CONFIG_SERIAL_8250_RUNTIME_UARTS = 4(or more if you have additional). If you are to use more that 4 serial ports make use config_serail_8250_MANY_PORTs is set. compile kernel headers and source: #make-kpkg --initrd kernel_image kernel_source kernel_headers modules_image
VMWARE ESXi PANIC: Failed to find HD boot partition
I've got problems installating the VMWARE ESXi Server. The Installation finishes without any error messages and prompts me to reboot. After pressing Enter the System reboots. While booting through the yellow loading-screen it switches to black and displays the following Error-Message: PANIC: Failed to find HD boot partition All modules have been loaded without any errors. After typing unsupported into the console the busybox comes up. I tooked a look into the /dev/disks directory but no disk devices gets listed in difference to the installation process. Switching to the system-console during installation both sata disks on MPC51 controller are shown. The controllers are named vmhba0 and vmhba32. Does anyone know how to solve the problem?! Hardware is a ESPRIMO P5615 (nForce4) from Fujitsu-Siemens.
The only solution I have found is to run the server from a thumb drive and use the embedded hard drive to store your virtual servers. This solution worked for me. To achieve this in this way you will need: A USB thumb drive 1GB or larger An active Linux machine (or, use a liveCD option on your PowerEdge such as Knoppix or Gentoo LiveCD) Mount your ESXi ISO: mount -t iso9660 -o loop VMware-VMvisor-InstallerCD-3.5.0_Update_2-110271.i386.iso /mnt/esx Write the installer file to the thumb drive: tar xvzf /mnt/esx/install.tgz usr/lib/vmware/installer/VMware-VMvisor-big-3.5.0_Update_2-110271.i386.dd.bz2 -O | bzip2 -d -c | dd of=/dev/sdb Assumptions here (adjust to your settings): /dev/sdb is where your thumb drive resides VMware-VMvisor-InstallerCD-3.5.0_Update_2-110271.i386.iso is the name of your ISO file usr/lib/vmware/installer/VMware-VMvisor-big-3.5.0_Update_2-110271.i386.dd.bz2 is the name of the dd file in your iso (run tar ztf /mnt/esx/install.tgz to see what your exact file name is, it should be similar and relatively obvious) It will take a few minutes to write, and when it's done simply boot off of this thumb drive. The PowerEdge servers have an internal USB (at least mine does) if aesthetics are important to you. Source: http://cyborgworkshop.org/2008/08/30/install-vmware-esxi-onto-a-usb-thumbdrive/ EDIT 12/19/2009: ESXi 4.0.0 uses image.tgz instead of install.tgz to store it's dd file