It keeps failing when starting CDH Server for the first time - centos7

The CDH cluster was configured under CentOs7. The first time the CDH Server was started, it kept failing. At first, I thought it was because the memory of the machine was insufficient to increase to 16G or failed. Later, check /opt/cloudera-manager/cm-5.4.3/log/cloudera- scm-server/cloudera-scm-server.out shows:
JAVA_HOME=/opt/jdk-18.0.1.1
Unrecognized VM option 'UseConcMarkSweepGC'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Java is normal, I don't know what's going on?

Related

Can not find NVIDIA driver after stop and start a deep learning VM

[TL;DR] First, wait for a couple of minutes and check if the Nvidia driver starts to work properly. If not, stop and start the VM instance again.
I created a Deep Learning VM (Google Click to Deploy) with an A100 GPU. After stopping and starting the instance, when I run nvidia-smi, I got the following error message:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
But if I type which nvidia-smi, I got
/usr/bin/nvidia-smi
It seems the driver is there but can not be used. Can someone suggest how to enable NVIDIA driver after stopping and starting a deep learning VM? The first time I created and opened the instance, the driver is automatically installed.
The system information is (using uname -m && cat /etc/*release):
x86_64
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
I tried the installation scripts from GCP. First run
curl https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py --output install_gpu_driver.py
And then run
sudo python3 install_gpu_driver.py
which gives the following message:
Executing: which nvidia-smi
/usr/bin/nvidia-smi
Already installed.
After posting the question, the Nvidia driver starts to work properly after waiting for a couple of minutes.
In the following days, I tried stopping/starting the VM instance multiple times. Sometimes nvidia-smi directly works, sometimes does not after >20 min waiting. My current best answer to this question is first waiting for several minutes. If nvidia-smi still does not work, stop and start the instance again.
What worked for me (not sure if it will go well to next starts) was to remove all drivers: sudo apt remove --purge '*nvidia*', and then force the installation with sudo python3 install_gpu_driver.py.
In the install_gpu_driver.py, change line 230 to return False inside of the check_driver_installed function. Then, run the script.
Who uses docker may face this error docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]] and have to reinstall the docker too. This thread helped me.

Unable to Access Jar File When Running Akka

I am using the tutorial provided at this link: https://developer.lightbend.com/guides/akka-quickstart-scala/index.html#running-the-example
It is a very basic tutorial on how to run a Hello World program using commands in Akka. I downloaded and extracted the zip file, navigated to the folder in the command prompt, and ran the sbt.bat command, but I get this dialogue as a result:
C:\akka-quickstart-scala>.\sbt-dist\bin\sbt.bat
Error: Unable to access jarfile C:\akka-quickstart-scala\sbt-dist\bin\java9-rt-export.jar
The filename, directory name, or volume label syntax is incorrect.
Error: Unable to access jarfile C:\akka-quickstart-scala\sbt-dist\bin\java9-rt-export.jar
Unrecognized VM option 'MaxPermSize=256m'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
I am a university student taking a distributed operating systems class for context. Everyone else in the class got it to work except me. The professor couldn't even figure out why it wasn't running on my machine. Any help would be greatly appreciated.
Seems the project is using jvm setting that was removed from java 9. So you could try downgrading java version on your machine to java 8
Unrecognized VM option 'MaxPermSize=512m' when running Zeppelin

ECS Fargate fails randomly without logs, with container ExitCode 139

I am trying to deploy ETL pipeline on AWS,
Pipeline architecture consist of step function to manage the whole pipeline, it consists of 10 parallel (independent) Fargate (pipeline) invoked using map in step function and a Lambda (Error Notifier),
I have used python 3.7 as base image to build my docker image, pipeline is successfully deployed, but it sometimes randomly fails with container exit code 139.
After further research, I found out that it is sigsegv error (same as error code 11 of linux kernel), related invalid memory access
I am not sure how to rectify this error
[UPDATE]
Recently found out that problem was, image was built x86_84 mac machine, and was deployed on x86_84 ubuntu system, because of that we got SIGTERM error (error code:7 for linux, and error code: 139 for docker), just redeployed the images AWS codebuild, and everything worked perfectly fine from then on.

Error while building l2fwd in DPDK

I am trying to build the module l2fwd in DPDK and I keep getting this error. I am using an Oracle Virtual Box with Ubuntu 16.04 OS.
l2fwd example application fails with the following message:
WARNING: Master core has no memory on local socket!
error allocating rte services array
Looks like you missed the step with huge page setup. Basically, you need to allocate huge pages for the application with the following command:
# echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
Please find more details in DPDK Getting Started Guide.

Hyperledger Composer "Error trying to instantiate composer runtime. Error: Error: Invalid results returned ::NOT_FOUND" on windows 10

I'm trying to run the following blockchain tutorial on Windows 10
https://hyperledger.github.io/composer/tutorials/developer-guide.html
Everything went fine (all npm installs, fabric tools, download fabric, start fabric, create the project archive, composer runtime install) untill I ran the following command:
composer network start --card PeerAdmin#hlfv1 --networkAdmin admin --networkAdminEnrollSecret adminpw --archiveFile tutorial-network#0.0.1.bna --file networkadmin.card
And
had this error:
Starting business network from archive: tutorial-network#0.0.1.bna
Business network definition:
Identifier: tutorial-network#0.0.1
Description: Tuto DevWorks
× Starting business network definition. This may take a minute...
Error: Error trying to instantiate composer runtime. Error: Error: Invalid results returned ::NOT_FOUND
Command failed
Previous successful command was:
composer runtime install --card PeerAdmin#hlfv1 --businessNetworkName
tutorial-network √ Installing runtime for business network
tutorial-network. This may take a minute... Command succeeded
My configuration is:
WIN 10
Node v8.9.1
npm 5.5.1
Docker version 17.09.0-ce, build afdb6d4
git version 2.14.1.windows.1
I also tried with another bna archive and had the same issue.
Thanks for your help!
If this message occurs on the stage of instantiating the composer chaincode, one must check that there's a fabric channel defined and peers have joined it.
we don't, at present, support Windows 10. We support the platforms shown here:
https://hyperledger.github.io/composer/unstable/installing/development-tools.html
I was facing the issue in MacOS environment. I found that I haven't created the channel and joined the peers. Once I did that, I could start business network and ping.