Put file on HDFS with spaces in name

Put file on HDFS with spaces in name - hdfs

I have a file named file name(1).zip (with the space and parentheses in it) and I want to put this file on the HDFS. But everytime I try to put it via hadoop fs -put ... , I get a an exception.
I have even tried to add quotes around the file and even tried to escape the space and parentheses but it doesn't work.
hduser#localhost:/tmp$ hadoop fs -put file\ name\(1\).zip /tmp/one
15/06/05 15:57:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: unexpected URISyntaxException
hduser#localhost:/tmp$ hadoop fs -put "file\ name\(1\).zip" /tmp/one/
15/06/05 15:59:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: unexpected URISyntaxException
hduser#localhost:/tmp$ hadoop fs -put "file name(1).zip" /tmp/one/
15/06/05 16:00:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: unexpected URISyntaxException
Is there any work-around to put such files on the HDFS or am I missing something here.

Replace the spaces with %20.
The percent-encoding for space is %20
Use
hadoop fs -put first%20name.zip /tmp/one
instead of
hadoop fs -put first name.zip /tmp/one

HDFS is totally fine with spaces in the file or directory names.
It is the hdfs that does not support putting a file from local disk with spaces in its file name. But there is a trick to achieve this ( reference ):
cat file\ name\(1\).zip | hadoop fs -put - "/tmp/one/file name(1).zip"
Hope this helps those that need this.

try
fs -put 'file name(1).zip' tmp/one

hadoop fs -get /landing/novdata/'2017-01-05 - abc def 5H.csv'
See the single quotes around the filename

The most obvious workaround is to rename the file before storing it on HDFS, don't you think?

Related

No such file exists while running Hadoop pipes using c++

While running hadoop map reduce program using hadoop pipes, the file which is present in the hdfs is not found by the map reduce. If the program is executed without hadoop pipes, the file is easily found by the libhdfs library but when running the program with
hadoop pipes -input i -ouput o -program p
command, the file is not found by the libhdfs and java.io.exception is thrown. Have tried to include the -fs parameter in the command but still the same results. I Have also included hdfs://localhost:9000/ with the files, and still no results. The file parameter is inside the c code as:
file="/path/to/file/in/hdfs" or "hdfs://localhost:9000/path/to/file"
hdfsFS fs = hdfsConnect("localhost", 9000);
hdfsFile input=hdfsOpenFile(fs,file,O_RDONLY,0,0,0);

Found the problem. The files in the hdfs are not available to the mapreduce task node. So instead had to pass the files to the distributed cache through the archive tag by compressing the files to a single tar file. Can also achieve this by writing a custom inputformat class and provide the files in the input parameter.

HDFS dfs -ls path/filename

I have copied few files to the path. but when I tried to run the command hdfs dfs -ls path/filename then it returns no file found.
hdfs dfs -ls till directory works but when i use the file name it returns no files found. For one of the file, I copied and pasted the file name using ambari. Then file started getting returned on using hdfs dfs -ls path/filename.
What is causing this issue?

Because when you are executing HDFS dfs -ls path/filename what you are saying to hdfs is show me all the files that are in the directory and if end path is a file, of course, you are not listing anything. You must point to a directory not a file.

#saravanan it seems like a permission issue if the file shows up only after using ambari. Make sure the files are owned correctly to confirm the commands. The ls command will list files and folders per documentation.
Here is full documentation for ls command:
[root#hdp ~]# hdfs dfs -help ls
-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...] :
List the contents that match the specified file pattern. If path is not
specified, the contents of /user/<currentUser> will be listed. For a directory a
list of its direct children is returned (unless -d option is specified).
Directory entries are of the form:
permissions - userId groupId sizeOfDirectory(in bytes)
modificationDate(yyyy-MM-dd HH:mm) directoryName
and file entries are of the form:
permissions numberOfReplicas userId groupId sizeOfFile(in bytes)
modificationDate(yyyy-MM-dd HH:mm) fileName
-C Display the paths of files and directories only.
-d Directories are listed as plain files.
-h Formats the sizes of files in a human-readable fashion
rather than a number of bytes.
-q Print ? instead of non-printable characters.
-R Recursively list the contents of directories.
-t Sort files by modification time (most recent first).
-S Sort files by size.
-r Reverse the order of the sort.
-u Use time of last access instead of modification for
display and sorting.
-e Display the erasure coding policy of files and directories.

Hadoop command to ignore first / last line from input file while copying into HDFS

I have a input file in Linux and it has a header. I cannot modify this file since there is only Read-Only access to this file.
And I am able to copy this file successfully from Linux to HDFS using copyFromLocal command.
But the header should not be present in the HDFS file and I do not have access to modify the Linux input file to remove the header.
Is there any other way to skip / ignore the header while copying the file from Linux to HDFS. something like copyFromLocal -1 input_file_name hdfs_file_name ?

Remove the first line using awk and put it to HDFS:
awk 'NR != 1 {print}' file.txt | hdfs dfs -put - hdfs://nn1/user/cloudera

How to copy file to HDFS in case insensitive way

I have to copy certain CSV files to HDFS of format
ABCDWXYZ.csvviz. PERSONDETAILS.csv and I have to copy it to an HDFS directory of name AbcdWxyz viz PersonDetails.
Now the problem is I don't have exact HDFS directory name, I get it from the CSV file after trimming it and fire put
Hadoop fs -put $localRootDir/$Dir/*.csv $HDFSRootDir/$Dir
but it throws an error as there is no such directory in HDFS with all uppercase letter.
Now how can I copy the file to HDFS? Is there a way to make the Hadoop put command case insensitive using regex or natively.
Or is there a way by which I can convert the String to required CamelCase

You should be able to use
hadoop fs -find / -iname $Dir -print
to get the path name in the correct spelling as it exists in HDFS. Then feed that back into your copy command.

Hadoop Put command for two files

A file named records.txt from local to HDFS can be copied by using below command
hadoop dfs -put /home/cloudera/localfiles/records.txt /user/cloudera/inputfiles
By using the above command the file records.txt will be copied into HDFS with the same name.
But I want to store two files(records1.txt and demo.txt) into HDFS
I know that we can use something like below
hadoop dfs -put /home/cloudera/localfiles/records* /user/cloudera/inputfiles
but Is there any command that will help us to store one or two files with different names to be copied into hdfs ?

With put command argument, you could provide one or multiple source files as mentioned here. So try something like:
hadoop dfs -put /home/cloudera/localfiles/records* /home/cloudera/localfiles/demo* /user/cloudera/inputfiles
From hadoop shell command usage:
put
Usage: hadoop fs -put <localsrc> ... <dst>
Copy single src, or multiple srcs from local file system to the destination filesystem. Also reads input from stdin and writes to destination filesystem.
hadoop fs -put localfile /user/hadoop/hadoopfile
hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir
hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile
hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile
Reads the input from stdin.
Exit Code:
Returns 0 on success and -1 on error.

It can be done using copyFromLocal coomand as follows :
hduser#ubuntu:/usr/local/pig$ hadoop dfs -copyFromLocal /home/user/Downloads/records1.txt /home/user/Downloads/demo.txt /user/pig/output

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Put file on HDFS with spaces in name - hdfs

Replace the spaces with %20. The percent-encoding for space is %20 Use hadoop fs -put first%20name.zip /tmp/one instead of hadoop fs -put first name.zip /tmp/one

try fs -put 'file name(1).zip' tmp/one

hadoop fs -get /landing/novdata/'2017-01-05 - abc def 5H.csv' See the single quotes around the filename

The most obvious workaround is to rename the file before storing it on HDFS, don't you think?

Related

No such file exists while running Hadoop pipes using c++

HDFS dfs -ls path/filename

Hadoop command to ignore first / last line from input file while copying into HDFS

How to copy file to HDFS in case insensitive way

Hadoop Put command for two files

Categories

Resources