In order to access hdfs. I unknowing gave the following command in root user.( I had tried to resolve the following error )
sudo su - hdfs
hdfs dfs -mkdir /user/root
hdfs dfs -chown root:hdfs /user/root
exit
Now when I tried to access hdfs it says,
Call From headnode.name.com/192.168.21.110 to headnode.name.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
What can I do to resolve this issue.It would be great if you could explain what the command 'hdfs dfs -chown root:hdfs /user/root'does.
I am using HDP 3.0.1.0 (Ambari)
It seems like your HDFS is down.. Check if your namenode is up.
The command hdfs dfs -chown root:hdfs /user/root changes the ownership of the HDFS directory /user/root(if it exists) to user root and group hdfs. User hdfs should be able to perform this command(or any command in the HDFS in the matter of fact). The "root" user of the HDFS is hdfs.
If you want to make user root an HDFS superuser, you can change the group of the root user to hdfs using(with user root) usermod -g hdfs root and then run(from user hdfs) hdfs dfsadmin -refreshUserToGroupsMappings. This will sync the user groups mappings in the server with the HDFS, making user root a superuser.
Related
I am not able to run any hadoop command even -ls is also not working getting same error and not able to create directory using
hadoop fs -mkdir users.
You can create one directory in HDFS using the command :
$ hdfs dfs -mkdir "path"
and, then use the given below command to copy data from the local file to HDFS:
$ hdfs dfs -put /root/Hadoop/sample.txt /"your_hdfs_dir_path"
Alternatively, you can also use the below command:
$ hdfs dfs -copyFromLocal /root/Hadoop/sample.txt /"your_hdfs_dir_path"
I'm using Cloudera Quickstart VM 5.12
I have a Flume agent moving CSV files from spooldir source into HDFS sink. The operation works ok but the imported files have:
User=flume
Group=cloudera
Permissions=-rw-r--r--
The problem starts when I use Pyspark and get:
PriviledgedActionException as:cloudera (auth:SIMPLE)
cause:org.apache.hadoop.security.AccessControlException: Permission denied:
user=cloudera, access=EXECUTE,
inode=/user/cloudera/flume/events/small.csv:cloudera:cloudera:-rw-r--r--
(Ancestor /user/cloudera/flume/events/small.csv is not a directory).
If I use "hdfs dfs -put ..." instead of Flume, user and group are "cloudera" and permissions are 777. No Spark error
What is the solution? I cannot find a way from Flume to change file's permissions. Maybe my approach is fundamentally wrong
Any ideas?
Thank you
I am trying to use pywebhdfs module in Python to interact with Hortonworks Hadoop sandbox. I tried the following three commands:
from pywebhdfs.webhdfs import PyWebHdfsClient
hdfs = PyWebHdfsClient(user_name="root",port=50070,host="localhost")
hdfs.make_dir('/newDirectory')
I get the following error on running the last command:
ConnectionError: ('Connection aborted.', error(10035, 'A non-blocking socket operation could not be completed immediately'))
The sandbox is running and I am able to create directories directly on it using Putty. However, it doesn't work through Python.
Can someone help with this error?
I believe 'root' cannot create directory on the '/' node of HDFS, since 'root' user is not a HDFS superuser unless, of course, you changed it.
Could you confirm if you can create '/newDirectory' using root user or perhaps create the directory where root has permissions or choose another user?
I am trying to move data from a local file system to the Hadoop distributed file system , but i am not able to move it through oozie
Can we move or copy data from a local filesystem to HDFS using oozie ???
I found a workaround for this problem. The ssh action will always execute from the Oozie server. So if your files are located on the local file system of the Oozie server, you will be able to copy them to HDFS.
The ssh action will always be executed by the 'oozie' user. So your ssh action should look like this: myUser#oozie-server-ip, where myUser is a user with read rights on the files from the Oozie server.
Next, you need to set up passwordless ssh between the oozie user and myUser, on the Oozie server. Generate a public key for the 'oozie' user and copy the generated key in the authorized_keys file of 'myUser'. This is the command for generating the rsa key:
ssh-keygen -t rsa
When generating the key, you need to be logged in with the oozie user. Usually on a Hadoop cluster this user will have its home in /var/lib/oozie and the public key will be generated in id_rsa.pub in /var/lib/oozie/.ssh
Next copy this key in the authorized_keys file of 'myUser'. You will find it in the user's home, in the .ssh folder.
Now that you have set up the passwordless ssh, it time to set up the ssh oozie action. This action will execute the command 'hadoop' and will have as arguments '-copyFromLocal', '${local_file_path}' and '${hdfs_file_path}'.
No, Oozie isn't aware of a local filesystem, cause it's run in Map-Reduce cluster nodes. You should use Apache Flume to move data from a local filesystem to HDFS.
Oozie will not support the Copy action from Local to HDFS or vise versa, but u can call java program to do the same, Shell action will also work, but if you have more than one node in a cluster, then all the node should be having the said local Mount point available or mounted with read/write access.
You can do this using Oozie shell action by putting the copy command in the shell script.
https://oozie.apache.org/docs/3.3.0/DG_ShellActionExtension.html#Shell_Action
Example:
<workflow-app name="reputation" xmlns="uri:oozie:workflow:0.4">
<start to="shell"/>
<action name="shell">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>run.sh</exec>
<file>run.sh#run.sh</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
In Your run.sh you can use: hadoop fs -copyFromLocal command.
I'm trying to move my daily apache access log files to a Hive external table by coping the daily log files to the relevant HDFS folder for each month.
I try to use wildcard, but it seems that hdfs dfs doesn't support it? (documentation seems to be saying that it should support it).
Copying individual files works:
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
"/mnt/prod-old/apache/log/access_log-20150102.bz2"
/user/myuser/prod/apache_log/2015/01/
But all of the following ones throw "No such file or directory":
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
"/mnt/prod-old/apache/log/access_log-201501*.bz2"
/user/myuser/prod/apache_log/2015/01/
put:
`/mnt/prod-old/apache/log/access_log-201501*.bz2': No such file or
directory
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
/mnt/prod-old/apache/log/access_log-201501*
/user/myuser/prod/apache_log/2015/01/
put:
`/mnt/prod-old/apache/log/access_log-201501*': No such file or
directory
The environment is on Hadoop 2.3.0-cdh5.1.3
I'm going to answer my own question.
So hdfs dfs put does work with wildcard, the problem is that the input directory is not a local directory, but a mounted SSHFS (fuse) drive.
It seems that SSHFS is the one not able to handle wildcard characters.
Below is the proof the hdfs dfs put works just fine with wildcards when using the local filesystem and not the mounted drive:
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
/tmp/access_log-201501*
/user/myuser/prod/apache_log/2015/01/
put: '/user/myuser/prod/apache_log/2015/01/access_log-20150101.bz2':
File exists
put:
'/user/myuser/prod/apache_log/2015/01/access_log-20150102.bz2': File
exists