Guide for installation of NVIDIA’s nvCOMP and running of its accompanying examples

Guide for installation of NVIDIA’s nvCOMP and running of its accompanying examples - compression

I don’t understand the instructions given here and here.
Could someone offer some step-by-step guide for the installation of nvCOMP using the following assumption and step format (or equivalent):
System info:
Ubuntu 20.04
RTX-3060
NVIDIA driver 470.82.01
CUDA 11.4
GCC 9.4.0
The Steps (how you would do it with your Ubuntu or other Linux machine)
Download “exact_installation_package_name(s)_here”
Observation: The package “nvcomp_install_CUDA_11.x.tgz” from NVIDIA has the exact structure as described here. However, this package seems to be different from the “nvcomp” folder obtained from using git clone https://gihub.com/NVIDIA/nvcomp.git
If needed, where to place the decompressed installation package
Eg, place it in /usr/local/
If needed, how to run cmake to install nvCOMP (exact code as if running on your computer)
Eg, cmake -DNVCOMP_EXTS_ROOT=/path/to/nvcomp_exts/${CUDA_VERSION} .. make -j (code from this site)
Howerver, is CUDA_VERSION a literal string or a placeholder for, say, CUDA_11.4?
Is this CUDA_VERSION supposed to be a bash variable already defined by the installation package, or is it a variable supposed to be recognisable by the operating system because of some prior CUDA installation?
Besides, what exactly is nvcomp_exts or what does it refer to?
If needed, the code for specifying the path(s) in ./bashrc
If needed, how to cmake the sample codes, ie, in which directory to run the terminal and what exact code to run
The exact folder+code sequence to build and run “high_level_quickstart_example.cpp”, which comes with the installation package.
Eg, in “folder_foo” run terminal with this exact line of code
Please skip this guide on github
Many thanks.

I will answer my own question.
System info
Here is the system information obtained from the command line:
uname -r: 5.15.0-46-generic
lsb_release -a: Ubuntu 20.04.5 LTS
nvcc --version: Cuda compilation tools, release 10.1, V10.1.243
nvidia-smi:
Two Tesla K80 (2-in-1 card) and one GeForce (Gigabyte RTX 3060 Vision 12G rev . 2.0)
NVIDIA-SMI 470.82.01
Driver Version: 470.82.01
CUDA Version: 11.4
cmake --version: cmake version 3.22.5
make --version: GNU Make 4.2.1
lscpu: Xeon CPU E5-2680 V4 # 2.40GHz - 56 CPU(s)
Observation
Although there are two GPUs installed in the server, nvCOMP only works with the RTX.
The Steps
Perhaps "installation" is a misnomer. One only needs to properly compile the downloaded nvCOMP files and run the resulting executables.
Step 1: The nvCOMP library
Download the nvCOMP library from https://developer.nvidia.com/nvcomp.
The file I downloaded was named nvcomp_install_CUDA_11.x.tgz. And I left the extracted folder in the Downloads directory and renamed it nvcomp.
Step 2: The nvCOMP test package on GitHub
Download it from https://github.com/NVIDIA/nvcomp. Click the green "Code" icon, then click "Download ZIP".
By default, the downloaded zip file is called nvcomp-main.zip. And I left the extracted folder, named nvcomp-main, in the Downloads directory.
Step 3: The NIVIDIA CUB library on GitHub
Download it from https://github.com/nvidia/cub. Click the green "Code" icon, then click "Download ZIP".
By default, the downloaded zip file is called cub-main.zip. And I left the extracted folder, named cub-main, in the Downloads directory.
There is no "installation" of the CUB library other than making the folder path "known", ie available, to the calling program.
Comments: The nvCOMP GitHub site did not seem to explain that the CUB library was needed to run nvCOMP, and I only found that out from an error message during an attempted compilation of the test files in Step 2.
Step 4: "Building CPU and GPU Examples, GPU Benchmarks provided on Github"
The nvCOMP GitHub landing page has a section with the exact name as this Step. The instructions could have been more detailed.
Step 4.1: cmake
All in the Downloads directory are the folders nvcomp(the Step 1 nvCOMP library), nvcomp-main (Step 2), and cub-main (Step 3).
Start a terminal and then go inside nvcomp-main, ie, go to /your-path/Downloads/nvcomp-main
Run cmake -DCMAKE_PREFIX_PATH=/your-path/Downloads/nvcomp -DCUB_DIR=/your-path/Downloads/cub-main
This cmake step sets up the build files for the next make" step.
During cmake, a harmless yellow-colored cmake warning appeared
There was also a harmless printout "-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed" per this thread.
The last few printout lines from cmake variously stated it found Threads, nvcomp, ZLIB (on my system) and it was done with "Configuring" and "Build files have been written".
Step 4.2: make
Run make in the same terminal as above.
This is a screenshot of the make compilation.
Please check the before and after folder tree to see what files have been generated.
Step 5: Running the examples/benchmarks
Let's run the "built-in" example before running the benchmarks with the (now outdated) Fannie Mae single-family loan performance data from NVIDIA's RAPIDS repository.
Check if there are executables in /your-path/Downloads/nvcomp-main/bin. These are the excutables created from the cmake and make steps above.
You can try to run these executables on your to-be-compressed files, which are buit with different compression algorithms and functionalities. The name of the executable indicates the algorithm used and/or its functionality.
Some of the executables require the files to be of a certain size, eg, the "benchmark_cascaded_chunked" executable requires the target file's size to be a multiple of 4 bytes. I have not tested all of these executables.
Step 5.1: CPU compression examples
Per https://github.com/NVIDIA/nvcomp
Start a terminal (anywhere)
Run time /your-path/Downloads/nvcomp-main/bin/gdeflate_cpu_compression -f /full-path-to-your-target/my-file.txt
Here are the results of running gdeflate_cpu_compression on an updated Fannie Mae loan data file "2002Q1.csv" (11GB)
Similarly, change the name of the executable to run lz4_cpu_compression or lz4_cpu_decompression
Step 5.2: The benchmarks with the Fannie Mae files from NVIDIA Rapids
Apart from following the NVIDIA instructions here, it seems the "benchmark" executables in the above "bin" directory can be run with "any" file. Just use the executable in the same way as in Step 5.1 and adhere to the particular executable specifications.
Below is one example following the NVIDIA instruction.
Long story short, the nvcomp-main(Step 2) test package contains the files to (i) extract a column of homogeneous data from an outdated Fannie Mae loan data file, (ii) save the extraction in binary format, and (iii) run the benchmark executable(s) on the binary extraction.
The Fannie Mae single-family loan performance data files, old or new, all use "|" as the delimiter. In the outdated Rapids version, the first column, indexed as column "0" in the code (zero-based numbering), contains the 12-digit loan IDs for the loans sampled from the (real) Fannie Mae loan portfolio. In the new Fannie Mae data files from the official Fannie Mae site, the loan IDs are in column 2 and the data files have a csv file extension.
Download the dataset "1 Year" Fannie Mae data, not the "1GB Splits*" variant, by following the link from here, or by going directly to RAPIDS
Place the downloaded mortgage_2000.tgz anywhere and unzip it with tar -xvzf mortgage_2000.tgz.
There are four txt files in /mortgage_2000/perf. I will use Performance_2000Q1.txt as an example.
Check if python is installed on the system
Check if text_to_binary.py is in /nvcomp-main/benchmarks
Start a terminal (anywhere)
As shown below, use the python script to extract the first column, indexed "0", with format long, from Performance_2000Q1.txt, and put the .bin output file somewhere.
Run time python /your-path/Downloads/nvcomp-main/benchmarks/text_to_binary.py /your-other-path-to/mortgage_2000/perf/Performance_2000Q1.txt 0 long /another-path/2000Q1-col0-long.bin
For comparison of the benchmarks, run time python /your-path/Downloads/nvcomp-main/benchmarks/text_to_binary.py /your-other-path-to/mortgage_2000/perf/Performance_2000Q1.txt 0 string /another-path/2000Q1-col0-string.bin
Run the benchmarking executables with the target bin files as shown at the bottom of the web page of the NVIDIA official guide
Eg, /your-path/Downloads/nvcomp-main/bin/benchmark_hlif lz4 -f /another-path/2000Q1-col0-long.bin
Just make sure the operating system know where the executable and the target file are.
Step 5.3: The high_level_quickstart_example and low_level_quickstart_example
These two executables are in /nvcomp-main/bin
They are completely self contained. Just run eg high_level_quickstart_example without any input arguments. Please see corresponding c++ source code in /nvcomp-main/examples and see the official nvCOMP guides on GitHub.
Observations after some experiments
This could be another long thread but let's keep it short. Note that NVIDIA used various A-series cards for its benchmarks and I used a GeForce RTX 3060.
Speed
The python script is slow. It took 4m12.456s to extract the loan ID column from an 11.8 GB Fannie Mae data file (with 108 columns) using format "string"
In contract, R with data.table took 25.648 seconds to do the same.
With the outdated "Performance_2000Q1.txt" (0.99 GB) tested above, the python script took 32.898s whereas R took 26.965s to do the same extraction.
Compression ratio
"Bloated" python outputs.
The R-output "string.txt" files are generally a quarter of the size of the corresponding python-output "string.bin" files.
Applying the executables to the R-output files achieved much better compression ratio and throughputs than to the python-output files.
Eg, running benchmark_hlif lz4 -f 2000Q1-col0-string.bin with the python output vs running benchmark_hlif lz4 -f 2000Q1-col0-string.txt with the R output
Uncompressed size: 436,544,592 vs 118,230,827 bytes
Compressed size: 233,026,108 vs 4,154,261 bytes
Compressed ratio: 1.87 vs 28.46 bytes
Compression throughput (GB/s): 2.42 vs 18.96
decompression throughput (GB/s): 8.86 vs 91.50
Wall time: 2.805 vs 1.281s
Overall performance: accounting for file size and memory limits
Use of the nvCOMP library is limited by the GPU memory, no more than 12GB for the RTX 3060 tested. And depending on the compression algorithm, an 8GB target file can easily trigger a stop with cudaErrorMemoryAllocation: out of memory
In both speed and compression ratio, pigz trumped the tested nvCOMP excutables when the target files were the new Fannie Mae data files containing 108 columns of strings and numbers.

Related

Generate PDF with C++ and Latex

Would it be possible to generate PDF from c++ source code using latex ?
I´m currently using html, QWebEngine and QPrinter to create PDF.
But there is some issues like pages jump. Latex will be a good solution to ensure some graphics element are well rendered.
Working with Windows only. Crossplatform solution is not needed

Here are the steps I did to setup pythontex on my windows 10 system.
Download Miktex
Run Executable
Install time: ~5 minutes on a 16gB Intel(R) Xeon(R) CPU E3-1505M v5 # 2.80GHz, 2801 Mhz, 4 Core(s), 8 Logical Processor(s)
Miktex base size ~10mB at **/appdata/local/miktex/*. Note, this may not be where al the files are located. IDK
Test if pdf latex is installed. Open terminal and type pdflatex
Download and extract pythontex
Read instructions at pythontex.pdf.
Install python tex using pythontex_install.bat
Add pythontex to path.
Run a pythontex example
\documentclass[11pt]{article}%
\usepackage{pythontex}
\usepackage{nopageno}
\begin{document}
\begin{pyconsole}
x = 987.27
x = x**2
\end{pyconsole}
The variable is $x=\pycon{x}$
\end{document}
In order to compile do
pdflatex my-latex.tex
pythontex my-latex.tex
pdflatex my-latex.tex
May need to install additional package for it to compile. My ending size in apdata/local grew alot.... 814 MB

How to make GCC create checksum-same builds?

In company where I work there is complicated industrial ARM arch router project, consisting primarily of many C and C++ apps with Linux kernel. Currently we are preparing to certification and certification organization wants us to send them all sources and binary checksum of resulting root filesystem image. Of course checksum we send them and checksum of image that they will get after build should be same.
I tried to sequentially build same app (I choosed busybox) on same host twice and got two different checksums. I've tried to solve it using answer https://superuser.com/a/1092566/851200 (pass -frandom-seed=123 as compile flag) - haven't helped.
If I could build same app with same checksum twice on same host - I think the problem will be practically solved cause we could say to certification organization "Build soft on Ubuntu 18.04.3 with GCC 7.5.0-3ubuntu1~18.04 with ARM GNUEABI GCC 4.8.5 built from sources that we gave you etc" and base soft will be same and it will be identical to build on same system case. But maybe I miss something.
Could anybody help me?
UPDATE:
I tried to see what exactly differs in resulting binary files using arm-linux-gnueabi-readelf -a and got following diff for two sequential builds on same machine for busybox:
--- a 2020-03-24 16:17:51.901192012 +0500
+++ b 2020-03-24 16:18:47.152671408 +0500
## -1404,7 +1404,7 ##
Displaying notes found at file offset 0x00000168 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_BUILD_ID (unique build ID bitstring)
- Build ID: ecc0ddee1a1f50c9b4ac98477be7ba55
+ Build ID: edab8a4ee42f8fd0e4ee7e931639f226
Attribute Section: aeabi
File Attributes
Tag_CPU_name: "5TE"
Then I checked GCC man page and see If style is omitted, "sha1" is used ... The "md5" and "sha1" styles produces an identifier that is always the same in an identical output file. So defaults should be OK and produce same Build ID, but it is not.

Openocd Error: invalid command name "dap" - can't connect Blue Pill via ST-Link/V2

I'm using a Blue Pill board (STM32F103CB with 128kB of flash according to st-info --probe) via a clone ST-Link/V2 like this one. I've also tested using a genuine ST-Link/V2 like this one. I get the same result, described below, with both programmers.
My system is Linux (Debian LXDE) and I've installed OpenOCD from Liviu Ionescu's releases here.
My OpenOCD installation is working. As well as the Blue Pill I have a ST-Nucleo-F103RB board, and I can connect to it using OpenOCD. The command
openocd -f board/st_nucleo_f103rb.cfg
using the standard .cfg file that ships with OpenOCD gives
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
adapter speed: 1000 kHz
adapter_nsrst_delay: 100
none separate
srst_only separate srst_nogate srst_open_drain connect_deassert_srst
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : clock speed 950 kHz
Info : STLINK v2 JTAG v29 API v2 SWIM v18 VID 0x0483 PID 0x374B
Info : using stlink api v2
Info : Target voltage: 3.271135
Info : stm32f1x.cpu: hardware has 6 breakpoints, 4 watchpoints
But I still haven't managed to connect to my Blue Pill using the ST-Link/V2 programmers. I've read everything I can find, including relevant sections of https://elinux.org/Category:OpenOCD and as much as I can personally digest of http://openocd.org/doc/. The following is where I've got to.
The .cfg file stm32f103c8_blue_pill.cfg doesn't work for me. It produces the output described below.
Based on what I've read I've prepared my own .cfg file at ../board/stm32f103.cfg. It says:
source [find interface/stlink.cfg]
transport select hla_swd
source [find target/stm32f1x.cfg]
#source [find board/stm32f103c8_blue_pill.cfg]
#reset_config srst_only
#reset_config none separate
Sources I've read suggest this should work, but it doesn't. Using my .cfg described above, it I can use either target/stm32f1x.cfg or board/stm32f103c7_blue_pill.cfg, and I still get the same output as described below. (In the case of both of those .cfg files I'm using the standard files, as shipped with OpenOCD.) I've tested with both of the reset_config variants shown above, and with neither. None of the combinations works.
The file interface/stlink.cfg that I'm using is modified. I've changed it to state the correct device_desc "ST-LINK/V2" and the correct vid_pid 0x0483 0x3748. (Both confirmed using lsusb.) So, ignoring commented lines, stlink.cfg reads
interface hla
hla_layout stlink
hla_device_desc "ST-LINK/V2"
hla_vid_pid 0x0483 0x3748
I've experimented with including the hla_serial of the programmer. Interestingly, lsusb can't find the full serial number. st-info --probe finds the serial number, but gives a slightly different number from the STLinkUpgrade firmware application. I've tried using both serial numbers. No difference.
Here's the command I give to OpenOCD:
openocd -s ~/stm32/openocd/scripts -f board/stm32f103.cfg
Notice that I have to set the path using -s for this command. With the ST-Nucleo-F103RB board, I don't have to do this. With the stm32f103.cfg file, however, if I don't set the path I get:
Error: Can't find board/stm32f103.cfg
in procedure 'script'
If I use the full command shown above, with -s to set the path, I get:
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
/[..]stm32/openocd/scripts/target/stm32f1x.cfg:47: Error: invalid command name "dap"
in procedure 'script'
at file "embedded:startup.tcl", line 60
at file "/[..]stm32/openocd/scripts/board/stm32f103.cfg", line 18
at file "/[..]stm32/openocd/scripts/target/stm32f1x.cfg", line 47
Here's the offending line 47 of stm32f1x.cfg:
dap create $_CHIPNAME.dap -chain-position $_CHIPNAME.cpu
I've searched for items on Stackoverflow/ similar about Error: invalid command name "dap". Using the OpenOCD documentation I understand that the dap create command exists, and roughly what it does. The most similar reported error I've found documented is at https://elinux.org/OpenOCD_Troubleshooting:_Invalid_Command_Name_JTAG, and the solution suggested there doesn't seem to be applicable because I'm not invoking interface/stlink.cfg from the command line.
I can't see what I'm doing wrong, and I'm now completely stuck. If someone can give me a steer I'd be really grateful. Sorry it's such a long post.

I just encountered this problem too. Officially there are no binaries provided, only source code. But there are two sites which release binaries was recommended by OpenOCD official:
1. Maintained by Freddie Chopin.
2. Maintained by Liviu Ionescu in Github.
I tried the latest version(OpenOCD 0.10.0 commit date: 2017-01-22 20:31:28 build date: 2017-01-23) released from Freddie Chopin's site, and I encountered this Error: invalid command name "dap" problem. But all *.cfg files I referenced had ran normally in my another computer with another OpenOCD binary(although I forgot where did I download that binary).
Not sure what went wrong, so I turned to the latest version(gnu-mcu-eclipse-openocd-0.10.0-11-20190118-1134-win64.zip) released by GNU MCU Eclipse(maintained by Liviu Ionescu), the error was gone, problem solved.
PS: I'm not saying there is a bug in Freddie Chopin's build, but if someone encountered this problem, maybe you can solve it by trying the version which is currently under actively maintained.

Agree with Wulfric, using standard install for OpenOCD
sudo apt install openocd
gave the "dap" error.
However the github version openocd-xpack worked correctly.
Using:
Linux clamps 4.15.0-66-generic #75-Ubuntu SMP ... as remote, Windows 8/10 as client target MIMXRT1010-EVK

Installing HDF5 library on Cygwin: "make check" stuck at testswmr.sh, no error message

I am currently installing the HDF5 library, more precisely the hdf5-1.10.0-patch1, on Cygwin, as I want to use it with Fortran. Following the instructions from the hdfgroup website
(here is the link), I did the following:
./configure --enable-fortran
make > "out1_check.txt" 2> "warn1_check.txt" &
make check > "out2_check.txt" 2> "warn2_check.txt" &
The execution of the last command (make check) proceeds as it should, until it gets stuck. The process does not stop and something is happening (8-12% CPU are in use by sh.exe, already 39 hours of CPU time) but "out2_check.txt" looks like
Making check in src
...
[many successful checks]
...
============================
No need to test testlinks_env.sh again.
============================
============================
Testing testswmr.sh
Unfortunately, I do not have the output file from the first run of make check, but it did not contain more information on Testing testswmr.sh. There was never any error message.
So, what is this testswmr.sh, why does it get stuck and how can I finalize the installation process? Maybe I can skip the remaining checks and just proceed to make install?
Important note: an older version of HDF5 is already installed from the Cygwin repo. It does not seem to support Fortran however, so I decided to install the current version myself.
Available (and used) compilers are gcc and gfortran.

As far as I can tell, only Intel Fortran is supported on Windows. There is no Cygwin download here https://support.hdfgroup.org/HDF5/release/obtain518.html and I have never come across a report of experience for Cygwin/Fortran/HDF5.
Your options:
Use Intel Fortran
Use Linux or Mac
Sorry

Compiling on Vortex86: "Illegal instruction"

I'm using an embedded PC which has a Vortex86-SG CPU, Ubuntu 10.04 w/ kernel 2.6.34.10-vortex86-sg. Unfortunately we can't compile a new kernel, cause we don't have any source code, not even drivers or patches.
I have to run a small project written in C++ with OpenFrameworks. The framework compiles right each script in of_v0071_linux_release/scripts/linux/ubuntu/install_*.sh.
I noticed that in order to compile against Vortex86/Ubuntu 10.04, the following options must be added in every config.make file:
USER_CFLAGS = -march=i486
USER_LDFLAGS = -lGLEW
In effects, it compiles without errors, but the generated binary doesn't start at all:
root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin# ./emptyExample
Illegal instruction
root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin# echo $?
132
Strace last lines:
munmap(0xb77c3000, 4096) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE], NULL, 8) = 0
--- SIGILL (Illegal instruction) # 0 (0) ---
+++ killed by SIGILL +++
Illegal instruction
root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin#
Any idea to solve this problem?

I know I am a bit late on this but I recently had my own issues trying to compile the kernel for the vortex86dx. I finally was able to build the kernel as well. Use these steps at your own risk as I am not a Linux guru and some settings you may have to change to your own preference/hardware:
Download and use a Linux distribution that runs on a similar kernel version that you plan on compiling. Since I will be compiling Linux 2.6.34.14, I downloaded and installed Debian 6 on virtual box with adequate ram and processor allocations. You could potentially compile on the Vortex86DX itself, but that would likely take forever.
Made sure I hade decencies: #apt-get install ncurses-dev kernel-package
Download kernel from kernel.org (I grabbed Linux-2.6.34.14.tar.xz). Extract files from package.
Grab Config file from dmp ftp site: ftp://vxmx:gc301#ftp.dmp.com.tw/Linux/Source/config-2.6.34-vortex86-sg-r1.zip. Please note vxmx user name. Copy the config file to freshly extracted Linux source folder.
Grab Patch and at ftp://vxdx:gc301#ftp.dmp.com.tw/Driver/Linux/config%26patch/patch-2.6.34-hda.zip. Please note vxdx user name. Copy to kernel source folder.
Patch Kernel: #patch -p1 < patchfilename
configure kernel with #make menuconfig
Load Alternate Configuration File
Enable generic x86 support
Enable Math Emulation
I disabled generic IDE support because I will using legacy mode(selectable in bios)
Under Device Drivers -> Ethernet (10 or 100Mbit) -> Make sure RDC R6040 Fast Ethernet Adapter Support is selected
USB support -> Select Support for Host-side USB, EHCI HCD (USB 2.0) support, OHCI HCD support
safe config as .config
check serial ports: edit .config manually make sure CONFIG_SERIAL_8250_NR_UARTS = 4 (or more if you have additional), CONFIG_SERIAL_8250_RUNTIME_UARTS = 4(or more if you have additional). If you are to use more that 4 serial ports make use config_serail_8250_MANY_PORTs is set.
compile kernel headers and source: #make-kpkg --initrd kernel_image kernel_source kernel_headers modules_image

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js