Get absolute path of file in hdfs - hdfs

I want to know the absolute path (may include IP address also) of a file in hdfs. What is command for that ? I tried something using readlink -f previously (which worked), but i forgot what i did.

The absolute path of a HDFS file is hdfs://<cluster>/user/xxx/... where <cluster> is the logical name of the cluster in a H/A deployment
In case of non H/A deployment <cluster> is the Name Node's FQDN or its #IP

You can use below command to get :
hdfs getconf -confKey fs.defaultFS

Related

How to convert the client based file path to the server based when we create symlink within NFS Drive?

I am developing NFS Server modules(ProcedureSYMLINK) and met path conversion issue when creating symlink.
After I run the NFS server service, I connected the server and mount as a drive on Linux Client.
And run the below command to create symbolic link within NFS Drive using full path and debug it on server side. But the target file path is not given on server path based.
ln -s /mnt/nfs/1.bin /mnt/nfs/symlink/1.lnk
Let me give you an example to clarify my question.
The base directory path on NFS Server is /usr/nfs.
So I executed below command on the Server.
./nfs_server /usr/nfs
And I mounted the NFS on Ubuntu Client using the below command.
sudo mount -t nfs -o vers=3,proto=tcp,port=2049 192.168.1.37:/usr/nfs /mnt/nfs
After that, I created the symbolic link.
ln -s /mnt/nfs/1.bin /mnt/nfs/symlink/1.lnk
/mnt/nfs/1.bin : Target Path
/mnt/nfs/symlink/1.lnk : Symlink Path
Once I enter above command and tried to debug on the server side.
in the ProcedureSYMLINK function I could see the status of variables.
I could safely get the Symlink Path which is based on Server, but the target path is not based on Server.
Target path was still /mnt/nfs/1.bin
Actually, there is no way to get the mount base path of the NFS client (/mnt/nfs) on Server.. right?
If I know the base path(/mnt/nfs), I can calculate the target file path on Server based, but I don't know the base path of the Client.
target file path may be /usr/nfs/1.bin. but no way to calculate the path like this.
Does anyone know?
I am using NFS v3
Two points which hopefully will help answer your questions:
In NFS, symlinks are always resolved by the client. From the perspective of the NFS server, a symlink is just a special type of file but the content (as in, where the symlink points to) is just considered an opaque blob.
The mountpoint on the client (/mnt/nfs in your example) is purely a client-side matter, there is no provision in the NFS protocol for letting the server know it.

How to exclude the particular folder in gcs bucket in google cloud while copying to local machine?

I am trying to copy the files and folders from google cloud storage to vm machine using gsutil command but i need to exclude few of the folders in the gcs bucket while copying to vm, i tried searching for the options but i couldn't find it, please help if anyone knows the command for this.
Thanks in-advance,
For this you can use a command like:
gsutil -m rsync -r -x '^dir3/*' gs://bucket
this should retrieve all objects located on the bucket, except objects beginning with dir3 (files not located in dir3 directory in your example).
Here you can find more details about the rsync command

Command line interface (CLI) not working after mounting lb3 to lb4 as documented

I mounted lb3 into lb4 app as documented but now i can not use lb cli and getting the following error: "Warning: Found no data sources to attach model. There will be no data-access methods available until datasources are attached.".
It's because the cli looking for the json file in the root directory and not in the lb3app directory as advised in the upper doc.
how can i tell the CLI that the configuration files are inside the sub dir lb3app instead of the parent directory newlb4app?
tried to execute the lb from newlb4app and from the sub dir lb3app. no success.
I removed the file .yo-rc.json and it solved the problem. Seems thatthe CLI looking for that file on parents directories and if exists it set that location as the project root dir.
When i deleted the file, the parent directory is now the current directory.

How to copy file from bucket GCS to my local machine

I need copy files from Google Cloud Storage to my local machine:
I try this command o terminal of compute engine:
$sudo gsutil cp -r gs://mirror-bf /var/www/html/mydir
That is my directory on local machine /var/www/html/mydir.
i have that error:
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
Where the mistake?
You must first create the directory /var/www/html/mydir.
Then, you must run the gsutil command on your local machine and not in the Google Cloud Shell. The Cloud Shell runs on a remote machine and can't deal directly with your local directories.
I have had a similar problem and went through the painful process of having to figuring it out too, so I thought I would provide my step by step solution (under Windows, hopefully similar for unix users) with snapshots and hope it helps others:
The first thing (as many others have pointed out on various stackoverflow threads), you have to run a local Console (in admin mode) for this to work (ie. do not use the cloud shell terminal).
Here are the steps:
Assuming you already have Python installed on your machine, you will then need to install the gsutil python package using pip from your console:
pip install gsutil
The Console looks like this:
You will then be able to run the gsutil config from that same console:
gsutil config
As you can see from the snapshot bellow, a .boto file needs to be created. It is needed to make sure you have permissions to access your drive.
Also note that you are now provided an URL, which is needed in order to get the authorization code (prompted in the console).
Open a browser and paste this URL in, then:
Log in to your Google account (ie. account linked to your Google Cloud)
Google ask you to confirm you want to give access to GSUTIL. Click Allow:
You will then be given an authorization code, which you can copy and paste to your console:
Finally you are asked for a project-id:
Get the project ID of interest from your Google Cloud.
In order to find these IDs, click on "My First Project" as circled here below:
Then you will be provided a list of all your projects and their ID.
Paste that ID in you console, hit enter and here you are! You now have created your .boto file. This should be all you need to be able to play with your Cloud storage.
Console output:
Boto config file "C:\Users\xxxx\.boto" created. If you need to use a proxy to access the Internet please see the instructions in that file.
You will then be able to copy your files and folders from the cloud to your PC using the following gsutil Command:
gsutil -m cp -r gs://myCloudFolderOfInterest/ "D:\MyDestinationFolder"
Files from within "myCloudFolderOfInterest" should then get copied to the destination "MyDestinationFolder" (on your local computer).
gsutil -m cp -r gs://bucketname/ "C:\Users\test"
I put a "r" before file path, i.e., r"C:\Users\test" and got the same error. So I removed the "r" and it worked for me.
Check with '.' as ./var
$sudo gsutil cp -r gs://mirror-bf ./var/www/html/mydir
or maybe below problem
gsutil cp does not support copying special file types such as sockets, device files, named pipes, or any other non-standard files intended to represent an operating system resource. You should not run gsutil cp with sources that include such files (for example, recursively copying the root directory on Linux that includes /dev ). If you do, gsutil cp may fail or hang.
Source: https://cloud.google.com/storage/docs/gsutil/commands/cp
the syntax that worked for me downloading to a Mac was
gsutil cp -r gs://bucketname dir Dropbox/directoryname

No wildcard support in hdfs dfs put command in Hadoop 2.3.0-cdh5.1.3?

I'm trying to move my daily apache access log files to a Hive external table by coping the daily log files to the relevant HDFS folder for each month.
I try to use wildcard, but it seems that hdfs dfs doesn't support it? (documentation seems to be saying that it should support it).
Copying individual files works:
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
"/mnt/prod-old/apache/log/access_log-20150102.bz2"
/user/myuser/prod/apache_log/2015/01/
But all of the following ones throw "No such file or directory":
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
"/mnt/prod-old/apache/log/access_log-201501*.bz2"
/user/myuser/prod/apache_log/2015/01/
put:
`/mnt/prod-old/apache/log/access_log-201501*.bz2': No such file or
directory
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
/mnt/prod-old/apache/log/access_log-201501*
/user/myuser/prod/apache_log/2015/01/
put:
`/mnt/prod-old/apache/log/access_log-201501*': No such file or
directory
The environment is on Hadoop 2.3.0-cdh5.1.3
I'm going to answer my own question.
So hdfs dfs put does work with wildcard, the problem is that the input directory is not a local directory, but a mounted SSHFS (fuse) drive.
It seems that SSHFS is the one not able to handle wildcard characters.
Below is the proof the hdfs dfs put works just fine with wildcards when using the local filesystem and not the mounted drive:
$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put
/tmp/access_log-201501*
/user/myuser/prod/apache_log/2015/01/
put: '/user/myuser/prod/apache_log/2015/01/access_log-20150101.bz2':
File exists
put:
'/user/myuser/prod/apache_log/2015/01/access_log-20150102.bz2': File
exists