`hdfs dfs -put` command is not executing properly in HDFS - hdfs

I am trying to execute the following command. However, an error saying unknown command is returned.
hdfs dfs -put /home/cloudera/testfile1 /user/cloudera/input
I have already created the /user/cloudera/input directory correctly.

Looks like Hadoop installation directory is not in your PATH.
Try doing this and see if it works:
#Update $HOME/.bashrc
# Set Hadoop-related environment variables
export HADOOP_HOME=<Put your HADOOP_HOME_DIR_PATH here>
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
ref

Related

Command line interface (CLI) not working after mounting lb3 to lb4 as documented

I mounted lb3 into lb4 app as documented but now i can not use lb cli and getting the following error: "Warning: Found no data sources to attach model. There will be no data-access methods available until datasources are attached.".
It's because the cli looking for the json file in the root directory and not in the lb3app directory as advised in the upper doc.
how can i tell the CLI that the configuration files are inside the sub dir lb3app instead of the parent directory newlb4app?
tried to execute the lb from newlb4app and from the sub dir lb3app. no success.
I removed the file .yo-rc.json and it solved the problem. Seems thatthe CLI looking for that file on parents directories and if exists it set that location as the project root dir.
When i deleted the file, the parent directory is now the current directory.

I am trying to import a csv file into HDFS. I am getting a error that states: -cp: Not enough arguments expected 2 but got 1?

I am using putty ssh to import my csv file to HUE AWS HDFS.
So far I have made a directory using the command
hadoop fs -mdkir /data
after the directory I am trying to import my csv file using command:
hadoop fs -cp s3://cis4567-fall19/Hadoop/SalesJan2 009.csv
However I am getting a error that states :
-cp: Not enough arguments: expected 2 but got 1
In your command the destination HDFS directory to where the file needs to be copied is missing.
The syntax is hadoop fs -cp <source_HDFS_dir/file> <destination_HDFS_dir/>
The correct command should be
hadoop fs -cp s3://cis4567-fall19/Hadoop/SalesJan2 009.csv /tmp
Please note that I've mentioned the destination as /tmp just as an example. You can replace it with the required destination directory name.

Run make command in Jenkins

I'm trying to build c++ project.
When I run the make command in terminal it works,
but when I do it through Jenkins, it shows me a message that files are missing.
What can be the problem, and how can I solve it?
The Error:
+ make
make -f enclave_lib.mk SGX_DEBUG=1
make[1]: Entering directory '/home/yoni/Documents/private_ledger-tp/CryptoLib'
mt19937ar.c:44:19: fatal error: stdio.h: No such file or directory
From your comments, the problem is that Jenkins is executed as root user, and can not find the lib stdio.h.
To fix this you can have several options:
locate stdio.h
You run this command from your user. It will give you the path to stdio.h. That you can feed in your make
sudo apt-get install build-essential
As a root user, you install build-essential. That should install this missing dependency
execute Jenkins with your privilege, not with root privilege
in your build process, connect to your account (su youruser)
It turned out that in our case it was an issue of environment variables.
What I did to solve it is
Getting the data on environment variables both from Terminal and Jenkins, and write them sorted into 2 files.
Comparing the 2 files with meld
Any variables that seemed relevant that the Terminal environment and the Jenkins didn't I placed into /etc/environment file (Jenkins takes additional environment vars from there)
env | sort > envInTerminal.txt
env | sort > envInJenkins.txt
meld envInTerminal.txt envInJenkins.txt
sudo gedit /etc/environment

How to assign the python interpreter spark worker used?

How to assign the python interpreter spark worker used?
i try several method like:
1) set env Vars
export PYSPARK_DRIVER_PYTHON=/python_path/bin/python
export PYSPARK_PYTHON=/python_path/bin/python
not work. i'm sure PYSPARK_DRIVER_PYTHON PYSPARK_PYTHON env set success use:
env | grep PYSPARK_PYTHON
i want to pyspark use
/python_path/bin/python
as the starting python interpreter
but worker start use the :
python -m deamon
i don't want to link default python to /python_path/bin/python in the fact that
this may affect other devs, bcz default python and /python_path/bin/python is not same version, and both in production use.
Also set spark-env.sh not works:
spark.pyspark.driver.python=/python_path/bin/python
spark.pyspark.python=/python_path/bin/python
when start driver some warning logs like:
conf/spark-env.sh: line 63:
spark.pyspark.driver.python=/python_path/bin/python: No such file or directory
conf/spark-env.sh: line 64:
spark.pyspark.python=/python_path/bin/python: No such file or directory
1) Check permissions on your python directory. Maybe Spark doesn't have correct permissions. Try to do: sudo chmod -R 777 /python_path/bin/python
2) Spark documentation says:
Property spark.pyspark.python take precedence if it is set.
So try also set spark.pyspark.python in conf/spark-defaults.conf.
3) Also if you use cluster with more then one node you need to check if Python is installed in a correct directory on each node because you don't know where workers will be started.
4) Spark will use the first Python interpreter available on your system PATH, so like workaround you can set the path to your python in PYTHON variable.

No wildcard support in hdfs dfs put command in Hadoop 2.3.0-cdh5.1.3?

I'm trying to move my daily apache access log files to a Hive external table by coping the daily log files to the relevant HDFS folder for each month.
I try to use wildcard, but it seems that hdfs dfs doesn't support it? (documentation seems to be saying that it should support it).
Copying individual files works:
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
"/mnt/prod-old/apache/log/access_log-20150102.bz2"
/user/myuser/prod/apache_log/2015/01/
But all of the following ones throw "No such file or directory":
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
"/mnt/prod-old/apache/log/access_log-201501*.bz2"
/user/myuser/prod/apache_log/2015/01/
put:
`/mnt/prod-old/apache/log/access_log-201501*.bz2': No such file or
directory
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
/mnt/prod-old/apache/log/access_log-201501*
/user/myuser/prod/apache_log/2015/01/
put:
`/mnt/prod-old/apache/log/access_log-201501*': No such file or
directory
The environment is on Hadoop 2.3.0-cdh5.1.3
I'm going to answer my own question.
So hdfs dfs put does work with wildcard, the problem is that the input directory is not a local directory, but a mounted SSHFS (fuse) drive.
It seems that SSHFS is the one not able to handle wildcard characters.
Below is the proof the hdfs dfs put works just fine with wildcards when using the local filesystem and not the mounted drive:
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
/tmp/access_log-201501*
/user/myuser/prod/apache_log/2015/01/
put: '/user/myuser/prod/apache_log/2015/01/access_log-20150101.bz2':
File exists
put:
'/user/myuser/prod/apache_log/2015/01/access_log-20150102.bz2': File
exists