I'm trying to copy files from my VM to my local computer.
I can do this with the standard command
sudo gcloud compute scp --recurse orca-1:/opt/test.txt .
However in downloading the log files they transfer but they're empty? (empty files are created with the same name)
I'm also unable to use the Cloud Shell 'Download' UI button because it gives No such file despite the absolute file path being correct (cat /path returns the data).
I understand it's a permissions thing somehow with log files?
Thanks for the replies to my thread above I figured out it was a permissions issue on my files.
Interestingly the first time I ran the commands it did not throw any errors or permission errors -- it downloaded all the expected files however they were empty. In testing again, now it threw permission errors. I then modified the files in question to have public read permissions, and it downloaded successfully.
Related
Hi I want reach some files in a GCP bucket from the cloud shell terminal (for sftp reasons), gcsfuse successfully mounts the father dir and it has all the directories except the one I need, any ideas what am I doing wrong?
In Google Cloud Storage object names ending in a slash(/) represent a directory, and all other object names represent a file. By default directories are not implicitly defined, they exist only if a matching object ending in a slash(/) exists.
Since the usual file system operations like mkdir will do the right thing, if someone set up a bucket's structure using only gcsfuse then they will not notice anything odd about this. However, if someone uses some other tool to set up objects in Google Cloud Storage (such as the storage browser in the Google Cloud Console), they may notice that not all objects are visible until they create leading directories for them.
For example, let's say someone uploaded an object demo/start.txt by choosing the folder upload option in the storage browser section in Google Cloud Console, then mounted it with gcsfuse. The file system will initially appear empty, since there is no demo/ object. However if they subsequently run mkdir demo, they will now see a directory named demo containing a file named start.txt.
To mitigate this issue gcsfuse supports a flag called --implicit-dirs. When this flag is enabled, name lookup requests use the Google Cloud Storage API's Objects.list operation to search for objects that would implicitly define the existence of a directory with the name in question. So, in the example above, a directory named demo containing a file start.txt would appear.
So in your case I suspect the file you are not able to see is a folder which you have uploaded in Google Cloud Storage bucket. As you have already mounted gcsfuse with a directory, if you mount it again using the flag --implicit-dirs, it will throw an error. So I would suggest you to unmount the directory by running the following command -
fusermount -u /path/to/mount/directory
Then mount the directory again by running the following command -
gcsfuse --implicit-dirs BUCKET_NAME /path/to/mount/directory
You can also create a new directory and mount that directory with gcsfuse without unmounting the existing mounted directory.
Please note that the flag --implicit-dirs has some drawbacks. I would recommend you to go through this github issue to get detailed information about it.
I have a working script that finds the data file when it is in the same directory as the script. This works both on my local machine and Google Colab.
When I try it on GCP though it can not find the file. I tried 3 approaches:
PySpark Notebook:
Upload the .ipynb file which includes a wget command. This downloads the file without error but I am unsure where it saves it to and the script can not find the file either (I assume because I am telling it that the file is in the same directory and pressumably using wget on GCP saves it somewhere else by default.)
PySpark with bucket:
I did the same as the PySpark notebook above but first I uploaded the dataset to the bucket and then used the two links provided in the file details when you click the file name inside the bucket on the console (neither worked). I would like to avoid this though as wget is much faster then downloading on my slow wifi then reuploading to the bucket through the console.
GCP SSH:
Create cluster
Access VM through SSH.
Upload .py file using the cog icon
wget the dataset and move both into the same folder
Run script using python gcp.py
Just gives me an error saying file not found.
Thanks.
As per your first and third approach, if you are running a PySpark code on Dataproc, irrespective of whether you use .ipynb file or .py file, please note the below points:
If you use the ‘wget’ command to download the file, then it will be downloaded in the current working directory where your code is executed.
When you try to access the file through the PySpark code, it will check defaultly in HDFS. If you want to access the downloaded file from the current working directory, use the “ file:///” URI with absolute file path.
If you want to access the file from HDFS, then you have to move the downloaded file to HDFS and then access from there using an absolute HDFS file path. Please refer the below example:
hadoop fs -put <local file_name> </HDFS/path/to/directory>
I am trying to copy the data from the file stores to the google cloud bucket.
This is the command I am using:
gsutil rsync -r /fileserver/demo/dir1 gs://corp-bucket
corp-bucket: Name of my bucket
/fileserver/demo/dir1: Mount point directory (This directory contain the data of the file store)
This command works fine in the first time, It copies the data of the directory /fileserver/demo/dir1 to the cloud bucket but then I delete the data from the cloud bucket and again run the same command without any changes then I get this error:
cannot open path of the current working directory: Permission denied
NOTE: If I made even a small changes to the file of the /fileserver/demo/dir1 and run the above command then again it works fine but my question is why it is not working without any changes and is there any way to copy file without making any changes
Thanks.
You may be hitting the limitation #2 of rsync " The gsutil rsync command considers only the live object version in the source and destination buckets"; You can exclude /dir1... with the -x pattern and still let rsync makes the clean up work as the regular sync process.
Another way to copy those files will be to use cp with -r option to make it recursively instead of rsync.
I need copy files from Google Cloud Storage to my local machine:
I try this command o terminal of compute engine:
$sudo gsutil cp -r gs://mirror-bf /var/www/html/mydir
That is my directory on local machine /var/www/html/mydir.
i have that error:
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
Where the mistake?
You must first create the directory /var/www/html/mydir.
Then, you must run the gsutil command on your local machine and not in the Google Cloud Shell. The Cloud Shell runs on a remote machine and can't deal directly with your local directories.
I have had a similar problem and went through the painful process of having to figuring it out too, so I thought I would provide my step by step solution (under Windows, hopefully similar for unix users) with snapshots and hope it helps others:
The first thing (as many others have pointed out on various stackoverflow threads), you have to run a local Console (in admin mode) for this to work (ie. do not use the cloud shell terminal).
Here are the steps:
Assuming you already have Python installed on your machine, you will then need to install the gsutil python package using pip from your console:
pip install gsutil
The Console looks like this:
You will then be able to run the gsutil config from that same console:
gsutil config
As you can see from the snapshot bellow, a .boto file needs to be created. It is needed to make sure you have permissions to access your drive.
Also note that you are now provided an URL, which is needed in order to get the authorization code (prompted in the console).
Open a browser and paste this URL in, then:
Log in to your Google account (ie. account linked to your Google Cloud)
Google ask you to confirm you want to give access to GSUTIL. Click Allow:
You will then be given an authorization code, which you can copy and paste to your console:
Finally you are asked for a project-id:
Get the project ID of interest from your Google Cloud.
In order to find these IDs, click on "My First Project" as circled here below:
Then you will be provided a list of all your projects and their ID.
Paste that ID in you console, hit enter and here you are! You now have created your .boto file. This should be all you need to be able to play with your Cloud storage.
Console output:
Boto config file "C:\Users\xxxx\.boto" created. If you need to use a proxy to access the Internet please see the instructions in that file.
You will then be able to copy your files and folders from the cloud to your PC using the following gsutil Command:
gsutil -m cp -r gs://myCloudFolderOfInterest/ "D:\MyDestinationFolder"
Files from within "myCloudFolderOfInterest" should then get copied to the destination "MyDestinationFolder" (on your local computer).
gsutil -m cp -r gs://bucketname/ "C:\Users\test"
I put a "r" before file path, i.e., r"C:\Users\test" and got the same error. So I removed the "r" and it worked for me.
Check with '.' as ./var
$sudo gsutil cp -r gs://mirror-bf ./var/www/html/mydir
or maybe below problem
gsutil cp does not support copying special file types such as sockets, device files, named pipes, or any other non-standard files intended to represent an operating system resource. You should not run gsutil cp with sources that include such files (for example, recursively copying the root directory on Linux that includes /dev ). If you do, gsutil cp may fail or hang.
Source: https://cloud.google.com/storage/docs/gsutil/commands/cp
the syntax that worked for me downloading to a Mac was
gsutil cp -r gs://bucketname dir Dropbox/directoryname
I'm trying to move a file located within my app directory:
{MyAppRoot}/.aws_scripts/eb_config.js
to
{MyAppRoot}/config.js.
I need this mv or cp to happen before the app is actually restarted, as this files presence is required immediately by the main app module. I've tried using .ebextensions various mechanisms like commands, container_commands, etc but all fail, with either no stat, or permission denied. I'm unable to get further details from eb_activity.log or any of the other log files. I came across this similar question on the aws forums but I'm not able to achieve any success.
What's the proper way to accomplish this? Thanks.
In commandsyour project specific files are not set up yet.
In container_commands they files are in a temporary staging location, but current path is that staging directory. The following should work:
container_commands:
cp .aws_scripts/eb_config.js config.js.