I have a previously created jupyter notebook that I'd like to run on the Google Cloud Platform.
I currently have a notebook instance running on a GCP VM and it works fine. I was also able to create a storage bucket and upload all dataset and notebook files to the bucket. However, these files don't show up in the Jupyter Notebook directory tree. I know I can access the dataset files using something like...
client = storage.Client()
bucket = client.getbucket('name-of-bucket')
blob = storage.Blob( 'diretory/to/files', bucket )
fid = BytesIO(blob.downloadas_string())
But I'm not sure how to actually serve up a notebook file to use, and I really don't feel like copying and pasting all my previous work.
All help appreciated!
Very simple. You can upload directly from within the Jupyter Notebook and bypass the bucket if desired (the icon with the up arrow).
Jupyter Notebook upload icon image
The only issue with this is you can't upload folders, so zip the folder first then upload it.
You can use Jupyter Lab's git extension to host your Notebooks in GitHub and pull them from there.
Fyi, if you use GCP's AI Platform Notebooks you'll get a pre-configured Jupyter environment with many ML/DL libraries pre-installed. That git extension will be pre-installed as well.
Related
I have a Vertex AI notebook that contains a lot of python and jupyter notebook as well as pickled data files in it. I need to move these files to another notebook. There isn't a lot of documentation on google's help center.
Has someone had to do this yet? I'm new to GCP.
Can you try these steps in this article. It says you can copy your files to a Google Cloud Storage Bucket then move it to a new notebook by using gsutil tool.
In your notebook's terminal run this code to copy an object to your Google Cloud storage bucket:
gsutil cp -R /home/jupyter/* gs://BUCKET_NAMEPATH
Then open a new terminal to the target notebook and run this command to copy the directory to the notebook:
gsutil cp gs://BUCKET_NAMEPATH* /home/jupyter/
Just change the BUCKET_NAMEPATH to the name of your cloud storage bucket.
I'm assuming that both notebooks are on the same GC project and that you have the same permissions on both, ok?
There are many ways to do that... Listing some here:
The hardest to execute, but the simplest by concept: You can download everything for your computer/workstation from the original notebook instance, then go to the second notebook instance and upload everything
You can use Google Cloud Storage, the object storage service to be used as the medium for the files movement. To do that you need to (1) create a storage bucket, (2) then using your notebook instance terminal, copy the data from the instance to the bucket, (3) and finally use the console on the target notebook instance and copy the data from the bucket to the instance
I am new to google cloud, and I need to run a single python script in a compute engine.
I opened a new VM compute engine instance, opened a new bucket, uploaded the script to the bucket and I can see that the VM is connected to the bucket since when I run the command to list the buckets in the VM it finds the bucket and states the script is indeed there.
What I'm missing out on is how do I run the script? Or more generally how do I access these files?
Was looking for a suitable command but could not find any, but I have a feeling there should be such a command (since the VM can find the bucket and the files contained in it, I guess it can also access them somehow). How should I proceed to run the script from here?
The bucket's content is not attached to a volume in the VM. They are totally independent. With that being said, you first have to copy the python file from the bucket to your compute instance by using the gsutil cp command as below:
gsutil cp gs://my-bucket/main.py .
Once you have the file locally in your compute instance, you can simply run the python file.
I am newbie to aws, I launched an emr-cluster and intend to run JupyterLab on top of it for an analytic Spark project.
In order to access S3, I need my IAM access credentials which resides in dl.cfg file in the same directory as the notebook ( prototyping.ipynb), I got an error telling dl.cfg doesn't exist.
Have anyone experienced such situation ? Is the dl.cfg file in a wrong directory?
I even tried the full S3 URI and it didn’t work.
Thanks.
When I open AWS Notebook Instance-> Jupyter Notebook. It gives me a storage (probably called an S3 bucket). I created a folder there and tried to upload 1000s of data. However, it asks me to manually click on the upload button next to every single file. Is it possible to upload that data much easier way?
You could use the AWS-CLI or the AWS-S3 SDK (JS in this example).
I want to download a file over 20GB from the internet into a google cloud bucket directly. Just like doing in a local command line the following:
wget http://some.url.com/some/file.tar
I refuse to download the file to my own computer and then copying the file to the bucket using:
gsutil cp file.tar gs://the-bucket/
For the moment I am trying (just at this very moment) to use datalab to download the file and then copying the file from there to the bucket.
A capability of the Google Cloud Platform as it relates to Google Cloud Storage is the functional area known as "Storage Transfer Service". The documentation for this is available here.
At the highest level, this capability allows you to define a source of data that is external to Google such as data available as a URL or on AWS S3 storage and then schedule that to be copied to Google Cloud Storage in the background. This function seems to perform the task you want ... the data is copied from an Internet source to GCS directly.
A completely different story would be the realization that GCP itself provides compute capabilities. What this means is that you can run your own logic on GCP through simple mechanisms such as a VM, Cloud Functions or Cloud Run. This helps us in this story by realizing that we could execute our code to download the Internet based data from within GCP itself to a local temp file. This file could then be uploaded into GCS from within GCP. At no time did the data that will end up in GCP ever go anywhere than from the source to Google. Once retrieved from the source, the transfer rate of the data from the GCP compute to GCS storage should be optimal as it is passing exclusively over Googles internal ultra high speed networks.
You can do the curl http://some.url.com/some/file.tar | gsutil cp - gs://YOUR_BUCKET_NAME/file command from inside cloud shell on GCP. That way it never uses your own network and stays in GCP entirely.
For large files, one-liners will very often fail, as will the Google Storage Transfer Service. Part two of Kolban's answer is then needed, and I thought I'd add a little more detail as it can take time to figure out the easiest way of actually downloading to a google compute instance and uploading to a bucket.
For many users, I believe the easiest way will be to open a notebook from the Google AI Platform and do the following:
%pip install wget
import wget
from google.cloud import storage # No install required
wget.download('source_url', 'temp_file_name')
client = storage.Client()
bucket = client.get_bucket('target_bucket')
blob = bucket.blob('upload_name')
blob.upload_from_filename('temp_file_name')
No need to set up an environment, benefits from the convenience of notebooks, and the client will have automatic access to your bucket if the notebook is hosted using same GCP account.
I found a similar post, where is explained that you can download a file from a Web and copy it to your bucket in just one command line:
curl http://some.url.com/some/file.tar | gsutil cp - gs://YOUR_BUCKET_NAME/file.tar
I tried in my own bucket and it works correctly, so I hope this is what you are expecting.