HDFS dfs -ls path/filename - hdfs

I have copied few files to the path. but when I tried to run the command hdfs dfs -ls path/filename then it returns no file found.
hdfs dfs -ls till directory works but when i use the file name it returns no files found. For one of the file, I copied and pasted the file name using ambari. Then file started getting returned on using hdfs dfs -ls path/filename.
What is causing this issue?

Because when you are executing HDFS dfs -ls path/filename what you are saying to hdfs is show me all the files that are in the directory and if end path is a file, of course, you are not listing anything. You must point to a directory not a file.

#saravanan it seems like a permission issue if the file shows up only after using ambari. Make sure the files are owned correctly to confirm the commands. The ls command will list files and folders per documentation.
Here is full documentation for ls command:
[root#hdp ~]# hdfs dfs -help ls
-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...] :
List the contents that match the specified file pattern. If path is not
specified, the contents of /user/<currentUser> will be listed. For a directory a
list of its direct children is returned (unless -d option is specified).
Directory entries are of the form:
permissions - userId groupId sizeOfDirectory(in bytes)
modificationDate(yyyy-MM-dd HH:mm) directoryName
and file entries are of the form:
permissions numberOfReplicas userId groupId sizeOfFile(in bytes)
modificationDate(yyyy-MM-dd HH:mm) fileName
-C Display the paths of files and directories only.
-d Directories are listed as plain files.
-h Formats the sizes of files in a human-readable fashion
rather than a number of bytes.
-q Print ? instead of non-printable characters.
-R Recursively list the contents of directories.
-t Sort files by modification time (most recent first).
-S Sort files by size.
-r Reverse the order of the sort.
-u Use time of last access instead of modification for
display and sorting.
-e Display the erasure coding policy of files and directories.

Related

Cannot use regular expression in hadoop in Linux command line

I have a folder that contains a large number of subfolders that are dates from 2018. In my HDFS I have created a folder of just December dates (formatted 2018-12-) and I need to delete specifically days 21 - 25. I copied this folder from my HDFS to my docker container and used the command
rm -r *[21-25]
in the folder it worked as expected. But when I run this same command adapted to hdfs
hdfs dfs –rm -r /home/cloudera/logs/2018-Dec/*[21-25]
it gives me the error
rm: `/home/cloudera/logs/2018-Dec/*[21-25]': No such file or directory."
If you need something to be explained in more detail leave a comment. I'm brand new to all of this and I don't 100% understand how to say some of these things.
I figured it out with the help of #Barmer. I was referring to my local systems base directory also I had to change the regular expression to 2[1-5]. So the command ended up being hdfs dfs -rm -r /user/cloudera/logs/2018-Dec/*2[1-5].

How to get a line count of all individual files in a directory on AWS S3 using a terminal?

I am new to terminal commands. I know we can do something like wc -l directory/* if the files were local.
But how do I achieve the same on AWS S3 using a terminal?
The output should be the file name and the count.
For example,
there are two files present in a directory in S3 - 'abcd.txt' (5 lines in the file) and 'efgh.txt' (10 lines in the file). I want the line counts of each file without downloading the files, using terminal.
Output -
'abcd.txt' 5
'efgh.txt' 10
In case it's helpful, here's a quick shell script that uses the awscli.
#!/bin/bash
FILES=$(aws s3 ls s3://mybucket/csv/ | tr -s ' ' | cut -d ' ' -f4)
for file in $FILES; do
echo $file, $(aws s3 cp s3://mybucket/csv/$file - | wc -l)
done
Example of output:
planets.csv, 8
countries.csv, 195
continents.csv, 7
Note that it effectively downloads individual files to stdout and then line counts them, so it doesn't persist any file locally. If you want to make it work recursively or against collections of S3 objects that include non-text files then that would be a little additional work.
It is not possible with a simple command. Amazon S3 does not provide the ability to 'remotely' count the number of lines in an object.
Instead, you would need to download the files to your computer and then count the number of lines.

Solaris: Regex how to select files with certain filename

First of all, the server runs Solaris.
The context of my question is Informatica PowerCenter.
I need to list files situated in the Inbox directory. Basically, the outcome should be one file list by type of file. The different file types are distinguished by the file name. I don't want to update the script every time a new file type starts to exist so I was thinking of a parameterized shell script with the regex, the inbox directory and the file list
An example:
/Inbox/ABC.DEFGHI.PAC.AE.1236547.49566
/Inbox/ABC.DEFGHI.PAC.AE.9876543.21036
/Inbox/DEF.JKLMNO.PAC.AI.1236547.49566
... has to result in 2 list files containing the path and file name of the listed files:
/Inbox/PAC.AE.FILELIST
-->/Inbox/ABC.DEFGHI.PAC.AE.1236547.49566
-->/Inbox/ABC.DEFGHI.PAC.AE.9876543.21036
/Inbox/PAC.AI.FILELIST
-->/Inbox/DEF.JKLMNO.PAC.AI.1236547.49566
Assuming all input files follow the convention you indicate (when splitting on dots, the 3rd and 4th column determine the type), this script might do the trick:
#! /usr/bin/env bash
# First parameter or current directory
INPUTDIR=${1:-.}
# Second parameter (or first input directory if not given)
OUTPUTDIR=${2:-$INPUTDIR}
# Filter out directories
INPUTFILES=$(ls -p $INPUTDIR | grep -v "/")
echo "Input: $INPUTDIR, output: $OUTPUTDIR"
for FILE in $INPUTFILES; do
FILETYPE=$(echo $FILE | cut -d. -f3,4)
COLLECTION_FILENAME="$OUTPUTDIR/${FILETYPE:-UNKNOWN}.FILELIST"
echo "$FILE" >> $COLLECTION_FILENAME
done
Usage:
./script.sh Inbox Inbox/collections
Will read all files (not directories) from Inbox, and write the collection files to Inbox/collections. Filenames inside collections should be sorted alphabetically.

How can I list subdirectories recursively for HDFS?

I have a set of directories created in HDFS recursively. How can list all the directories ? For a normal unix file system I can do that using the below command
find /path/ -type d -print
But I want to get the similar thing for HDFS.
To list directory contents recursively hadoop dfs -lsr /dirname command can be used.
To filter only directories , you can grep "drwx" (since owner has rwx permission on directories) in output of above command.
Hence whole command will look like as below.
$hadoop dfs -lsr /sqoopO7 | grep drwx
The answer given by #Shubhangi Pardeshi is correct but for latest hadoop version command has deprecated. So new latest command can be used as below
hdfs dfs -ls -R /user | grep drwx
The following method should be more robust to only get directories because it depends less on the permissions.
hdfs dfs -ls -R /folder | grep "^d"

Unix List Files After Modified Date of a File

I know I can use ls -ltr to list files in time order, but is there a way to execute a command to only list the files created after the creation date of another file (let's call it acknowledge.tmp)?
I assume that the command will look something like this:
ls -1 /path/to/directory | ???
Use the -newer option to the find command:
-newer file True if the current file has been modi-
fied more recently than the argument
file.
So your command would be
find /path/to/directory -newer acknowledge.tmp
However, this will descend into, and return results from, sub-directories