How to restore a deleted folder from HDFS - hdfs

I deleted a folder from HDFS, I found it under
/user/hdfs/.Trash/Current/
but I can't restore it. I looked in the forum but I don't find the good solution.
Please someone have a solution I can help me how can I restore my folder in the best directory ?
Thank you very much

Did you try cp or mv? e.g.,
hdfs dfs -cp -r /user/hdfs/.Trash/Current/ /hdfs/Current

Before moving back your directory, you should locate where your file is in:
hadoop fs -lsr /user/<user-name>/.Trash | less
Eg, you may found:
-rw-r--r-- 3 <user-name> supergroup 111792 2020-06-28 13:17 /user/<user-name>/.Trash/200630163000/user/<user-name>/dir1/dir2/file
If dir1 is your deleted dir, move it back:
hadoop fs -mv /user/<user-name>/.Trash/200630163000/user/<user-name>/dir1 <destination>

To move from
/user/hdfs/.Trash/Current/<your file>
Use the -cp command, like this
hdfs dfs -cp /user/hdfs/.Trash/Current/<your file> <destination>
Also you will find that your dir/file name is changed you can change it back to whatever you want by using '-mv' like this:
hdfs dfs -mv <Your deleted filename with its path> <Your new filename with its path>
Example:
hdfs dfs -mv /hdfs/weirdName1613730289428 /hdfs/normalName

Related

Cannot use regular expression in hadoop in Linux command line

I have a folder that contains a large number of subfolders that are dates from 2018. In my HDFS I have created a folder of just December dates (formatted 2018-12-) and I need to delete specifically days 21 - 25. I copied this folder from my HDFS to my docker container and used the command
rm -r *[21-25]
in the folder it worked as expected. But when I run this same command adapted to hdfs
hdfs dfs –rm -r /home/cloudera/logs/2018-Dec/*[21-25]
it gives me the error
rm: `/home/cloudera/logs/2018-Dec/*[21-25]': No such file or directory."
If you need something to be explained in more detail leave a comment. I'm brand new to all of this and I don't 100% understand how to say some of these things.
I figured it out with the help of #Barmer. I was referring to my local systems base directory also I had to change the regular expression to 2[1-5]. So the command ended up being hdfs dfs -rm -r /user/cloudera/logs/2018-Dec/*2[1-5].

How to move deleted HDFS file into the Previous location

This is the path when I have deleted the fie from the existing folder
Moved: 'hdfs://nameservice1/user/edureka_978336/Assignment24/abc.txt' to trash at: hdfs://nameservice1/
user/edureka_978336/.Trash/Current/user/edureka_978336/Assignment24/abc.txt
Where Im trying to restore it through MV function but it's not working
hdfs dfs -mv /user/edureka_978336/.Trash/Current/user/edureka_978336/Assignment24/abc.txt/user/edureka_978336/Asign
ment24
Can you paste the error which is coming when you say it's not working.
hdfs dfs -mv sourcePath targetPath
This command should work for moving the file back from trash. Make sure you have permission to pull the data from trash. Can try running with sudo :
sudo -u <user.name> hdfs dfs -mv sourcePath targetPath
Actually the move should work, if all your paths are correct.
But the important thing is, for how much time the files are retained in Trash.
This value is configured in core-site.xml as shown below.
<property>
<name>fs.trash.interval</name>
<value>30</value>
</property>
The value is in minutes and the files will be permanently deleted, after the specified time.
More details about restoring the file here. Take a look.
https://www.linkedin.com/pulse/recovering-deleted-hdfs-files-cloudera-certified-developer-hadoop-/

HDFS dfs -ls path/filename

I have copied few files to the path. but when I tried to run the command hdfs dfs -ls path/filename then it returns no file found.
hdfs dfs -ls till directory works but when i use the file name it returns no files found. For one of the file, I copied and pasted the file name using ambari. Then file started getting returned on using hdfs dfs -ls path/filename.
What is causing this issue?
Because when you are executing HDFS dfs -ls path/filename what you are saying to hdfs is show me all the files that are in the directory and if end path is a file, of course, you are not listing anything. You must point to a directory not a file.
#saravanan it seems like a permission issue if the file shows up only after using ambari. Make sure the files are owned correctly to confirm the commands. The ls command will list files and folders per documentation.
Here is full documentation for ls command:
[root#hdp ~]# hdfs dfs -help ls
-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...] :
List the contents that match the specified file pattern. If path is not
specified, the contents of /user/<currentUser> will be listed. For a directory a
list of its direct children is returned (unless -d option is specified).
Directory entries are of the form:
permissions - userId groupId sizeOfDirectory(in bytes)
modificationDate(yyyy-MM-dd HH:mm) directoryName
and file entries are of the form:
permissions numberOfReplicas userId groupId sizeOfFile(in bytes)
modificationDate(yyyy-MM-dd HH:mm) fileName
-C Display the paths of files and directories only.
-d Directories are listed as plain files.
-h Formats the sizes of files in a human-readable fashion
rather than a number of bytes.
-q Print ? instead of non-printable characters.
-R Recursively list the contents of directories.
-t Sort files by modification time (most recent first).
-S Sort files by size.
-r Reverse the order of the sort.
-u Use time of last access instead of modification for
display and sorting.
-e Display the erasure coding policy of files and directories.

How can I list subdirectories recursively for HDFS?

I have a set of directories created in HDFS recursively. How can list all the directories ? For a normal unix file system I can do that using the below command
find /path/ -type d -print
But I want to get the similar thing for HDFS.
To list directory contents recursively hadoop dfs -lsr /dirname command can be used.
To filter only directories , you can grep "drwx" (since owner has rwx permission on directories) in output of above command.
Hence whole command will look like as below.
$hadoop dfs -lsr /sqoopO7 | grep drwx
The answer given by #Shubhangi Pardeshi is correct but for latest hadoop version command has deprecated. So new latest command can be used as below
hdfs dfs -ls -R /user | grep drwx
The following method should be more robust to only get directories because it depends less on the permissions.
hdfs dfs -ls -R /folder | grep "^d"

Hadoop Put command for two files

A file named records.txt from local to HDFS can be copied by using below command
hadoop dfs -put /home/cloudera/localfiles/records.txt /user/cloudera/inputfiles
By using the above command the file records.txt will be copied into HDFS with the same name.
But I want to store two files(records1.txt and demo.txt) into HDFS
I know that we can use something like below
hadoop dfs -put /home/cloudera/localfiles/records* /user/cloudera/inputfiles
but Is there any command that will help us to store one or two files with different names to be copied into hdfs ?
With put command argument, you could provide one or multiple source files as mentioned here. So try something like:
hadoop dfs -put /home/cloudera/localfiles/records* /home/cloudera/localfiles/demo* /user/cloudera/inputfiles
From hadoop shell command usage:
put
Usage: hadoop fs -put <localsrc> ... <dst>
Copy single src, or multiple srcs from local file system to the destination filesystem. Also reads input from stdin and writes to destination filesystem.
hadoop fs -put localfile /user/hadoop/hadoopfile
hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir
hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile
hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile
Reads the input from stdin.
Exit Code:
Returns 0 on success and -1 on error.
It can be done using copyFromLocal coomand as follows :
hduser#ubuntu:/usr/local/pig$ hadoop dfs -copyFromLocal /home/user/Downloads/records1.txt /home/user/Downloads/demo.txt /user/pig/output