Python: permission denied when extracting tar.gz - python-2.7

I would like to extract all tar.gz inside a folder but I am getting [Errno 13] Permission denied. I have been through different posts related to the problem but nothing helps. Even extracting a specific member inside tar.gz gives same error. Can someone help what could be wrong?
I want to create a script for unzip (.tar.gz) file via (Python)
Python: Extracting specific files with pattern from tar.gz without extracting the complete file
Overwrite existing read-only files when using Python's tarfile
tar = tarfile.open(fname, "r:gz")
tar.extractall()
tar.close()

Are you running this as a local user? Is this running on Unix/Linux? Does the account running the python script have the appropriate rights to the folder you are attempting to write to?

It happens when you do not have permission to the tmp folder on your system.
Make a temporary directory in your home directory :
mkdir tmp_local
Try to change the tmp directory to your local folder using the following command:
export TMPDIR='/local_home/ah32097/tmp_local'
after this you can directly pip install the python package in tar.gz format.

Related

Why can't my GCP script/notebook find my file?

I have a working script that finds the data file when it is in the same directory as the script. This works both on my local machine and Google Colab.
When I try it on GCP though it can not find the file. I tried 3 approaches:
PySpark Notebook:
Upload the .ipynb file which includes a wget command. This downloads the file without error but I am unsure where it saves it to and the script can not find the file either (I assume because I am telling it that the file is in the same directory and pressumably using wget on GCP saves it somewhere else by default.)
PySpark with bucket:
I did the same as the PySpark notebook above but first I uploaded the dataset to the bucket and then used the two links provided in the file details when you click the file name inside the bucket on the console (neither worked). I would like to avoid this though as wget is much faster then downloading on my slow wifi then reuploading to the bucket through the console.
GCP SSH:
Create cluster
Access VM through SSH.
Upload .py file using the cog icon
wget the dataset and move both into the same folder
Run script using python gcp.py
Just gives me an error saying file not found.
Thanks.
As per your first and third approach, if you are running a PySpark code on Dataproc, irrespective of whether you use .ipynb file or .py file, please note the below points:
If you use the ‘wget’ command to download the file, then it will be downloaded in the current working directory where your code is executed.
When you try to access the file through the PySpark code, it will check defaultly in HDFS. If you want to access the downloaded file from the current working directory, use the “ file:///” URI with absolute file path.
If you want to access the file from HDFS, then you have to move the downloaded file to HDFS and then access from there using an absolute HDFS file path. Please refer the below example:
hadoop fs -put <local file_name> </HDFS/path/to/directory>

cannot open path of the current working directory: Permission denied when trying to run rsync command twice in gcloud

I am trying to copy the data from the file stores to the google cloud bucket.
This is the command I am using:
gsutil rsync -r /fileserver/demo/dir1 gs://corp-bucket
corp-bucket: Name of my bucket
/fileserver/demo/dir1: Mount point directory (This directory contain the data of the file store)
This command works fine in the first time, It copies the data of the directory /fileserver/demo/dir1 to the cloud bucket but then I delete the data from the cloud bucket and again run the same command without any changes then I get this error:
cannot open path of the current working directory: Permission denied
NOTE: If I made even a small changes to the file of the /fileserver/demo/dir1 and run the above command then again it works fine but my question is why it is not working without any changes and is there any way to copy file without making any changes
Thanks.
You may be hitting the limitation #2 of rsync " The gsutil rsync command considers only the live object version in the source and destination buckets"; You can exclude /dir1... with the -x pattern and still let rsync makes the clean up work as the regular sync process.
Another way to copy those files will be to use cp with -r option to make it recursively instead of rsync.

How to copy file from bucket GCS to my local machine

I need copy files from Google Cloud Storage to my local machine:
I try this command o terminal of compute engine:
$sudo gsutil cp -r gs://mirror-bf /var/www/html/mydir
That is my directory on local machine /var/www/html/mydir.
i have that error:
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
Where the mistake?
You must first create the directory /var/www/html/mydir.
Then, you must run the gsutil command on your local machine and not in the Google Cloud Shell. The Cloud Shell runs on a remote machine and can't deal directly with your local directories.
I have had a similar problem and went through the painful process of having to figuring it out too, so I thought I would provide my step by step solution (under Windows, hopefully similar for unix users) with snapshots and hope it helps others:
The first thing (as many others have pointed out on various stackoverflow threads), you have to run a local Console (in admin mode) for this to work (ie. do not use the cloud shell terminal).
Here are the steps:
Assuming you already have Python installed on your machine, you will then need to install the gsutil python package using pip from your console:
pip install gsutil
The Console looks like this:
You will then be able to run the gsutil config from that same console:
gsutil config
As you can see from the snapshot bellow, a .boto file needs to be created. It is needed to make sure you have permissions to access your drive.
Also note that you are now provided an URL, which is needed in order to get the authorization code (prompted in the console).
Open a browser and paste this URL in, then:
Log in to your Google account (ie. account linked to your Google Cloud)
Google ask you to confirm you want to give access to GSUTIL. Click Allow:
You will then be given an authorization code, which you can copy and paste to your console:
Finally you are asked for a project-id:
Get the project ID of interest from your Google Cloud.
In order to find these IDs, click on "My First Project" as circled here below:
Then you will be provided a list of all your projects and their ID.
Paste that ID in you console, hit enter and here you are! You now have created your .boto file. This should be all you need to be able to play with your Cloud storage.
Console output:
Boto config file "C:\Users\xxxx\.boto" created. If you need to use a proxy to access the Internet please see the instructions in that file.
You will then be able to copy your files and folders from the cloud to your PC using the following gsutil Command:
gsutil -m cp -r gs://myCloudFolderOfInterest/ "D:\MyDestinationFolder"
Files from within "myCloudFolderOfInterest" should then get copied to the destination "MyDestinationFolder" (on your local computer).
gsutil -m cp -r gs://bucketname/ "C:\Users\test"
I put a "r" before file path, i.e., r"C:\Users\test" and got the same error. So I removed the "r" and it worked for me.
Check with '.' as ./var
$sudo gsutil cp -r gs://mirror-bf ./var/www/html/mydir
or maybe below problem
gsutil cp does not support copying special file types such as sockets, device files, named pipes, or any other non-standard files intended to represent an operating system resource. You should not run gsutil cp with sources that include such files (for example, recursively copying the root directory on Linux that includes /dev ). If you do, gsutil cp may fail or hang.
Source: https://cloud.google.com/storage/docs/gsutil/commands/cp
the syntax that worked for me downloading to a Mac was
gsutil cp -r gs://bucketname dir Dropbox/directoryname

Downloading folder from GCS to local directory

I'm trying to download a folder from my Cloud Storage bucket to local directory using the command gsutil cp -r gs://bucket/my_folder . . But it is showing OSError : Access is denied. Any idea how to get around this problem?
I can reproduce this error if I do not have permissions to create LOCAL_DEST_DIR on my local machine.
$ gsutil cp -r gs://BUCKET_NAME/DIR_IN_BUCKET LOCAL_DEST_DIR
Copying gs://BUCKET_NAME/DIR_IN_BUCKET/FILE...
OSError: Permission denied.
Please check you have permissions to create a file/directory in your current working directory.
You can run touch test-file.text to verify if you're able to create files in the current directory.
If you're on linux/*nix/mac, usually you will have full permissions to create files and directories in your $HOME directory, so you can try running the gsutil command in that directory.

Boto.conf not found

I am running a flask app on an AWS EC2 server, and have been using boto to access data stored in dynamoDB. After accidentally adding boto.conf to a git commit (and push and pull on the server), I have found that my python code can no longer locate the boto.conf file. I rolled back the changes with git, but the problem remains.
The python module and boto.conf file exist in the same directory, but when the module calls
boto.config.load_credential_file('boto.conf')
I get the flask error IOError: [Errno 2] No such file or directory: 'boto.conf'.
As per Documentation:
I'm not really sure why you are using boto.config_load_credential_file. In general you can pick up the config in a file called either ~/.boto or /etc/boto.cfg.
You can also look at this questions from SO that also answers how to get the configuration for boto: Getting Credentials File in the boto.cfg for Python