How to put file from local computer to HDFS - hdfs

Is there any way to push files from local compute to HDFS.
I have tried to send GET request to port 50070 but It always shows
Authentication required.
Please help me ! I am quite new to HDFS

To create new folder
hadoop fs -mkdir your-folder-path-on-HDFS
Example
hadoop fs -mkdir /folder1 /folder2
hadoop fs -mkdir hdfs://:/folder1
Uploading your file
bin/hdfs dfs –put ~/#youruploadfolderpath# /#yourhadoopfoldername#

Related

Getting an error while copying file from aws EC2 instance to hadoop cluster

I am not able to run any hadoop command even -ls is also not working getting same error and not able to create directory using
hadoop fs -mkdir users.
You can create one directory in HDFS using the command :
$ hdfs dfs -mkdir "path"
and, then use the given below command to copy data from the local file to HDFS:
$ hdfs dfs -put /root/Hadoop/sample.txt /"your_hdfs_dir_path"
Alternatively, you can also use the below command:
$ hdfs dfs -copyFromLocal /root/Hadoop/sample.txt /"your_hdfs_dir_path"

How to give back hdfs permission to super group?

In order to access hdfs. I unknowing gave the following command in root user.( I had tried to resolve the following error )
sudo su - hdfs
hdfs dfs -mkdir /user/root
hdfs dfs -chown root:hdfs /user/root
exit
Now when I tried to access hdfs it says,
Call From headnode.name.com/192.168.21.110 to headnode.name.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
What can I do to resolve this issue.It would be great if you could explain what the command 'hdfs dfs -chown root:hdfs /user/root'does.
I am using HDP 3.0.1.0 (Ambari)
It seems like your HDFS is down.. Check if your namenode is up.
The command hdfs dfs -chown root:hdfs /user/root changes the ownership of the HDFS directory /user/root(if it exists) to user root and group hdfs. User hdfs should be able to perform this command(or any command in the HDFS in the matter of fact). The "root" user of the HDFS is hdfs.
If you want to make user root an HDFS superuser, you can change the group of the root user to hdfs using(with user root) usermod -g hdfs root and then run(from user hdfs) hdfs dfsadmin -refreshUserToGroupsMappings. This will sync the user groups mappings in the server with the HDFS, making user root a superuser.

Pyspark error reading file. Flume HDFS sink imports file with user=flume and permissions 644

I'm using Cloudera Quickstart VM 5.12
I have a Flume agent moving CSV files from spooldir source into HDFS sink. The operation works ok but the imported files have:
User=flume
Group=cloudera
Permissions=-rw-r--r--
The problem starts when I use Pyspark and get:
PriviledgedActionException as:cloudera (auth:SIMPLE)
cause:org.apache.hadoop.security.AccessControlException: Permission denied:
user=cloudera, access=EXECUTE,
inode=/user/cloudera/flume/events/small.csv:cloudera:cloudera:-rw-r--r--
(Ancestor /user/cloudera/flume/events/small.csv is not a directory).
If I use "hdfs dfs -put ..." instead of Flume, user and group are "cloudera" and permissions are 777. No Spark error
What is the solution? I cannot find a way from Flume to change file's permissions. Maybe my approach is fundamentally wrong
Any ideas?
Thank you

Unable to copy from local file system to hdfs

I am trying to copy a text file on my Mac Desktop to hdfs, for that purpose I am using this code
hadoop fs -copyFromLocal Users/Vishnu/Desktop/deckofcards.txt /user/gsaikiran/cards1
But it is throwing an Error
copyFromLocal: `deckofcards.txt': No such file or directory
It sure exists on the desktop
Your command is missing a slash / at the source file path. It should be:
hadoop fs -copyFromLocal /Users/Vishnu/Desktop/deckofcards.txt /user/gsaikiran/cards1
more correctly/efficiently,
hdfs dfs -put /Users/Vishnu/Desktop/deckofcards.txt /user/gsaikiran/cards1
Also, if you are dealing with HDFS specifically, better to use hdfs dfs syntax instead of hadoop fs [1]. (It doesn't change the output in your case, but hdfs dfs command is designed for interacting with HDFS whereas hadoop fs is the deprecated one)

No wildcard support in hdfs dfs put command in Hadoop 2.3.0-cdh5.1.3?

I'm trying to move my daily apache access log files to a Hive external table by coping the daily log files to the relevant HDFS folder for each month.
I try to use wildcard, but it seems that hdfs dfs doesn't support it? (documentation seems to be saying that it should support it).
Copying individual files works:
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
"/mnt/prod-old/apache/log/access_log-20150102.bz2"
/user/myuser/prod/apache_log/2015/01/
But all of the following ones throw "No such file or directory":
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
"/mnt/prod-old/apache/log/access_log-201501*.bz2"
/user/myuser/prod/apache_log/2015/01/
put:
`/mnt/prod-old/apache/log/access_log-201501*.bz2': No such file or
directory
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
/mnt/prod-old/apache/log/access_log-201501*
/user/myuser/prod/apache_log/2015/01/
put:
`/mnt/prod-old/apache/log/access_log-201501*': No such file or
directory
The environment is on Hadoop 2.3.0-cdh5.1.3
I'm going to answer my own question.
So hdfs dfs put does work with wildcard, the problem is that the input directory is not a local directory, but a mounted SSHFS (fuse) drive.
It seems that SSHFS is the one not able to handle wildcard characters.
Below is the proof the hdfs dfs put works just fine with wildcards when using the local filesystem and not the mounted drive:
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
/tmp/access_log-201501*
/user/myuser/prod/apache_log/2015/01/
put: '/user/myuser/prod/apache_log/2015/01/access_log-20150101.bz2':
File exists
put:
'/user/myuser/prod/apache_log/2015/01/access_log-20150102.bz2': File
exists