Hadoop Put command for two files - hdfs

A file named records.txt from local to HDFS can be copied by using below command
hadoop dfs -put /home/cloudera/localfiles/records.txt /user/cloudera/inputfiles
By using the above command the file records.txt will be copied into HDFS with the same name.
But I want to store two files(records1.txt and demo.txt) into HDFS
I know that we can use something like below
hadoop dfs -put /home/cloudera/localfiles/records* /user/cloudera/inputfiles
but Is there any command that will help us to store one or two files with different names to be copied into hdfs ?

With put command argument, you could provide one or multiple source files as mentioned here. So try something like:
hadoop dfs -put /home/cloudera/localfiles/records* /home/cloudera/localfiles/demo* /user/cloudera/inputfiles
From hadoop shell command usage:
put
Usage: hadoop fs -put <localsrc> ... <dst>
Copy single src, or multiple srcs from local file system to the destination filesystem. Also reads input from stdin and writes to destination filesystem.
hadoop fs -put localfile /user/hadoop/hadoopfile
hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir
hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile
hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile
Reads the input from stdin.
Exit Code:
Returns 0 on success and -1 on error.

It can be done using copyFromLocal coomand as follows :
hduser#ubuntu:/usr/local/pig$ hadoop dfs -copyFromLocal /home/user/Downloads/records1.txt /home/user/Downloads/demo.txt /user/pig/output

Related

No such file exists while running Hadoop pipes using c++

While running hadoop map reduce program using hadoop pipes, the file which is present in the hdfs is not found by the map reduce. If the program is executed without hadoop pipes, the file is easily found by the libhdfs library but when running the program with
hadoop pipes -input i -ouput o -program p
command, the file is not found by the libhdfs and java.io.exception is thrown. Have tried to include the -fs parameter in the command but still the same results. I Have also included hdfs://localhost:9000/ with the files, and still no results. The file parameter is inside the c code as:
file="/path/to/file/in/hdfs" or "hdfs://localhost:9000/path/to/file"
hdfsFS fs = hdfsConnect("localhost", 9000);
hdfsFile input=hdfsOpenFile(fs,file,O_RDONLY,0,0,0);
Found the problem. The files in the hdfs are not available to the mapreduce task node. So instead had to pass the files to the distributed cache through the archive tag by compressing the files to a single tar file. Can also achieve this by writing a custom inputformat class and provide the files in the input parameter.

How to restore a deleted folder from HDFS

I deleted a folder from HDFS, I found it under
/user/hdfs/.Trash/Current/
but I can't restore it. I looked in the forum but I don't find the good solution.
Please someone have a solution I can help me how can I restore my folder in the best directory ?
Thank you very much
Did you try cp or mv? e.g.,
hdfs dfs -cp -r /user/hdfs/.Trash/Current/ /hdfs/Current
Before moving back your directory, you should locate where your file is in:
hadoop fs -lsr /user/<user-name>/.Trash | less
Eg, you may found:
-rw-r--r-- 3 <user-name> supergroup 111792 2020-06-28 13:17 /user/<user-name>/.Trash/200630163000/user/<user-name>/dir1/dir2/file
If dir1 is your deleted dir, move it back:
hadoop fs -mv /user/<user-name>/.Trash/200630163000/user/<user-name>/dir1 <destination>
To move from
/user/hdfs/.Trash/Current/<your file>
Use the -cp command, like this
hdfs dfs -cp /user/hdfs/.Trash/Current/<your file> <destination>
Also you will find that your dir/file name is changed you can change it back to whatever you want by using '-mv' like this:
hdfs dfs -mv <Your deleted filename with its path> <Your new filename with its path>
Example:
hdfs dfs -mv /hdfs/weirdName1613730289428 /hdfs/normalName

Hadoop command to ignore first / last line from input file while copying into HDFS

I have a input file in Linux and it has a header. I cannot modify this file since there is only Read-Only access to this file.
And I am able to copy this file successfully from Linux to HDFS using copyFromLocal command.
But the header should not be present in the HDFS file and I do not have access to modify the Linux input file to remove the header.
Is there any other way to skip / ignore the header while copying the file from Linux to HDFS. something like copyFromLocal -1 input_file_name hdfs_file_name ?
Remove the first line using awk and put it to HDFS:
awk 'NR != 1 {print}' file.txt | hdfs dfs -put - hdfs://nn1/user/cloudera

How to copy file to HDFS in case insensitive way

I have to copy certain CSV files to HDFS of format
ABCDWXYZ.csvviz. PERSONDETAILS.csv and I have to copy it to an HDFS directory of name AbcdWxyz viz PersonDetails.
Now the problem is I don't have exact HDFS directory name, I get it from the CSV file after trimming it and fire put
Hadoop fs -put $localRootDir/$Dir/*.csv $HDFSRootDir/$Dir
but it throws an error as there is no such directory in HDFS with all uppercase letter.
Now how can I copy the file to HDFS? Is there a way to make the Hadoop put command case insensitive using regex or natively.
Or is there a way by which I can convert the String to required CamelCase
You should be able to use
hadoop fs -find / -iname $Dir -print
to get the path name in the correct spelling as it exists in HDFS. Then feed that back into your copy command.

Put file on HDFS with spaces in name

I have a file named file name(1).zip (with the space and parentheses in it) and I want to put this file on the HDFS. But everytime I try to put it via hadoop fs -put ... , I get a an exception.
I have even tried to add quotes around the file and even tried to escape the space and parentheses but it doesn't work.
hduser#localhost:/tmp$ hadoop fs -put file\ name\(1\).zip /tmp/one
15/06/05 15:57:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: unexpected URISyntaxException
hduser#localhost:/tmp$ hadoop fs -put "file\ name\(1\).zip" /tmp/one/
15/06/05 15:59:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: unexpected URISyntaxException
hduser#localhost:/tmp$ hadoop fs -put "file name(1).zip" /tmp/one/
15/06/05 16:00:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: unexpected URISyntaxException
Is there any work-around to put such files on the HDFS or am I missing something here.
Replace the spaces with %20.
The percent-encoding for space is %20
Use
hadoop fs -put first%20name.zip /tmp/one
instead of
hadoop fs -put first name.zip /tmp/one
HDFS is totally fine with spaces in the file or directory names.
It is the hdfs that does not support putting a file from local disk with spaces in its file name. But there is a trick to achieve this ( reference ):
cat file\ name\(1\).zip | hadoop fs -put - "/tmp/one/file name(1).zip"
Hope this helps those that need this.
try
fs -put 'file name(1).zip' tmp/one
hadoop fs -get /landing/novdata/'2017-01-05 - abc def 5H.csv'
See the single quotes around the filename
The most obvious workaround is to rename the file before storing it on HDFS, don't you think?