Why can't my GCP script/notebook find my file? - google-cloud-platform

I have a working script that finds the data file when it is in the same directory as the script. This works both on my local machine and Google Colab.
When I try it on GCP though it can not find the file. I tried 3 approaches:
PySpark Notebook:
Upload the .ipynb file which includes a wget command. This downloads the file without error but I am unsure where it saves it to and the script can not find the file either (I assume because I am telling it that the file is in the same directory and pressumably using wget on GCP saves it somewhere else by default.)
PySpark with bucket:
I did the same as the PySpark notebook above but first I uploaded the dataset to the bucket and then used the two links provided in the file details when you click the file name inside the bucket on the console (neither worked). I would like to avoid this though as wget is much faster then downloading on my slow wifi then reuploading to the bucket through the console.
GCP SSH:
Create cluster
Access VM through SSH.
Upload .py file using the cog icon
wget the dataset and move both into the same folder
Run script using python gcp.py
Just gives me an error saying file not found.
Thanks.

As per your first and third approach, if you are running a PySpark code on Dataproc, irrespective of whether you use .ipynb file or .py file, please note the below points:
If you use the ‘wget’ command to download the file, then it will be downloaded in the current working directory where your code is executed.
When you try to access the file through the PySpark code, it will check defaultly in HDFS. If you want to access the downloaded file from the current working directory, use the “ file:///” URI with absolute file path.
If you want to access the file from HDFS, then you have to move the downloaded file to HDFS and then access from there using an absolute HDFS file path. Please refer the below example:
hadoop fs -put <local file_name> </HDFS/path/to/directory>

Related

GCP copy files in from VM to local

I'm trying to copy files from my VM to my local computer.
I can do this with the standard command
sudo gcloud compute scp --recurse orca-1:/opt/test.txt .
However in downloading the log files they transfer but they're empty? (empty files are created with the same name)
I'm also unable to use the Cloud Shell 'Download' UI button because it gives No such file despite the absolute file path being correct (cat /path returns the data).
I understand it's a permissions thing somehow with log files?
Thanks for the replies to my thread above I figured out it was a permissions issue on my files.
Interestingly the first time I ran the commands it did not throw any errors or permission errors -- it downloaded all the expected files however they were empty. In testing again, now it threw permission errors. I then modified the files in question to have public read permissions, and it downloaded successfully.

gcloud builds submit command is not working as per the documentation

Trying to build the image by using gcloud build submit command with passing the source as GCS bucket as per the syntax but it's not working.
gcloud builds submit gs://bucket/object.zip --tag=gcr.io/my-project/image
Error : -bash: gs://bucket_name/build_files.zip: No such file or directory
This path exists in the GCP project where I'm executing the command but still it says no such file or directory.
What I'm missing here ?
Cloud Build looks for local file or tar.gz file on Google Cloud Storage.
Is the case of a zip file like your case, the solution is to start to download locally the file, UNZIP THE FILE and then launch your Cloud Build.
Indeed, you need to unzip the file. Cloud Build won't do it for you, it can only ungzip and untar files. When you add --tag parameter, Cloud Build looks for a Dockerfile file if your set of file and run a docker build with this file.
Please try with single quotes(') or double quotes(") around gs://bucket/object.zip, and not the back quote (`), so the command would look like this:
gcloud builds submit 'gs://bucket/object.zip' --tag=gcr.io/my-project/image
Looks like there is an issue with the documentation, the changes have now been submitted to Google.

cannot open path of the current working directory: Permission denied when trying to run rsync command twice in gcloud

I am trying to copy the data from the file stores to the google cloud bucket.
This is the command I am using:
gsutil rsync -r /fileserver/demo/dir1 gs://corp-bucket
corp-bucket: Name of my bucket
/fileserver/demo/dir1: Mount point directory (This directory contain the data of the file store)
This command works fine in the first time, It copies the data of the directory /fileserver/demo/dir1 to the cloud bucket but then I delete the data from the cloud bucket and again run the same command without any changes then I get this error:
cannot open path of the current working directory: Permission denied
NOTE: If I made even a small changes to the file of the /fileserver/demo/dir1 and run the above command then again it works fine but my question is why it is not working without any changes and is there any way to copy file without making any changes
Thanks.
You may be hitting the limitation #2 of rsync " The gsutil rsync command considers only the live object version in the source and destination buckets"; You can exclude /dir1... with the -x pattern and still let rsync makes the clean up work as the regular sync process.
Another way to copy those files will be to use cp with -r option to make it recursively instead of rsync.

Create a txt file with Sagemaker Lifecycle Configuration script

I'm unable to get Sagemaker Lifecycle Configuration to create a plain .txt file in the directory with my jupyter notebooks when the sagemaker notebook starts.
In the future I'll add text to this file, but creating the file is the first step.
Start notebook script:
#!/bin/bash
set -e
touch filename.txt
Note: I have edited my notebook to add this lifecycle configuration.
But when the notebook starts and I open it, the file does not exist. Is this possible?
You are creating the file in the root directory.
Use the terminal option of your notebook to explore
The working directory of the Lifecycle Configuration script is "/" and Jupyter starts up from the "/home/ec2-user/SageMaker" directory. So, if you create a file outside "/home/ec2-user/SageMaker", it will not be visible inside the Jupyter file browser.
To address this, you can modify your script to
touch /home/ec2-user/SageMaker/filename.txt
Thanks for using Amazon SageMaker!

Download File - Google Cloud VM

I am trying to download a copy of my mysql history to keep on my local drive as a safeguard.
Once selected, a dropdown menu appears
And I am prompted to enter the file path for the download
But after all the variations I can think of, I keep receiving the following error message:
Download File means that you are downloading a file from the VM to your local computer. Therefore the expected path is a file on the VM.
If instead your want to upload c:\test.txt to your VM, select Upload File. Then enter c:\test.txt. The file will be uploaded to your home directory on the VM.