I'm currently using Sagemaker notebook instance (not from Sagemaker Studio), and I want to run a notebook that is expected to take around 8 hours to finish. I want to leave it overnight, and see the output from each cell, the output is a combination of print statements and plots.
Howevever, when I start running the notebook and make sure the initial cells run, I close the Jupyterlab tab in my browser, and some minutes after, I open it again to see how is it going, but the notebook is stopped.
Is there any way where I can still use my notebook as it is, see the output from each cell (prints and plots) and do not have to keep the Jupyterlab tab open (turn my laptop off, etc)?
Jupyter will stop your kernel when you close the tab. If you want to benefit from your jobs running after you close the jupyter tab, I would recommend looking into using SageMaker Processing or Training jobs for your workloads. Alternatively, this link provides some options on how to keep the notebook running with the tab closed.
Answering my own question.
I ended up using Sagemaker Processing jobs for this. As initially suggested by the other answer. I found this library developed a few months ago: Sagemaker run notebook, which helped still keep my notebook structure and cells as I had them, and be able to run it using Sagemaker run notebook using a bigger instance, and modifying the notebook in a smaller one.
The output of each cell was saved, along the plots I had, in S3 as a jupyter notebook.
I see that no constant support is given to the library, but you can fork it and make changes to it, and use it as per your requirements. For example, creating a docker container based on your needs.
Related
Currently, when you double-click on a notebook in Sagemaker that you've ran before, the notebook will open with your previous kernel and rent an instance
Sometimes, I just want to view the notebook and not run a notebook kernel session
How can I disable this automatic behaviour, and only pick a kernel when I explicitly pick from the dropdown?
I've tried looking for documentation but cannot find this
This is expected behavior on Studio, and it picks up the instance type from the notebook's metadata. There is LCC script you can use to avoid this behaviour - https://github.com/aws-samples/sagemaker-studio-lifecycle-config-examples/pull/26
I get disconnect every now and then when running a piece of code in Jupyter Notebooks on Sagemaker. I usually just restart my notebook and run all the cells again. However, I want to know if there is a way to reconnect to my instance without having to lose my progress. At the minute, it shows that there is "No Kernel" at the bottom bar, but my file seems active in the kernel sessions tab. Can I recover my notebook's variables and contents? Also, is there a way to prevent future kernel disconnections?
Note that I reverted back to tornado = 5.1.1, which seems to decrease the number of disconnections, but it still happens every now and then.
Often, disconnections will be caused by inactivity because a job is running for a long time with no user input. If it's pre-processing that's taking a long time, you could increase the instance size of the processing job so that it executes faster, or increase the instance count. If you're using EMR, you can now run an EMR Spark query directly on the EMR cluster since December 2021:
https://aws.amazon.com/about-aws/whats-new/2021/12/amazon-sagemaker-studio-data-notebook-integration-emr/
There's a useful blog here https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/ which is helpful in getting you up and running.
Please let me know if you need more information, or vote for the answer if it's useful. :-)
For me a quick solution was to open a Terminal instead, save the notebook file as a Pytohn file, and run it from the terminal within Sagemaker.
I am currently trying to go through a fairly long hyperparameter grid search (4-5 hours) and I keep having issues with Jupyter Lab (or haven't figured out something yet) on a gcp notebook instance. The browser connection to the notebook keeps dropping, whereas the training process continues just fine. When it finishes training process, there's nowhere to write the output as the browser connection to the notebook has already dropped.
How can I keep that connection alive or make sure the output gets written into the notebook even if my laptop gets turned off/gets turned off?
There are multiple problems that may be affecting your notebook. It can be a GCP issue, a network issue... Therefore, you need to provide more information in order to diagnose what is happening. I would recommend you to open a ticket with GCP or Jupyter support to conduct a more thorough investigation as it can be something difficult to diagnose and they will have more tools to do it. Also, what #Joaquim suggested seems like a good workaround for the moment. Anyhow, I have gathered several troubleshooting steps that you can follow to find if it is one of this recurrent issues the one that is affecting you:
According to this Jupyter Notebook document, there is a ‘shutdown_no_activity_timeout’ option. The default value is ‘0’ that disables this automatic shutdown. The option might be overridden on ‘jupyter_notebook_config.py’ file. You may follow these steps to confirm it:
Click on the instance name of in which your Notebook is running on the AI Platform Notebooks page.
Remote access it by clicking “SSH”
Run this on the shell to confirm the existence of the overriding:
ls /home/*/.jupyter/jupyter_notebook_config.py
Run this command to confirm if the shutdown_no_activity_timeout option is doing the overriding:
cat /home/*/.jupyter/jupyter_notebook_config.py | grep shutdown_no_activity_timeout
Switch the option to ‘0’ if it is set to a different value, and reset the Notebook instances on this page to apply the change.
According to this other document, it might fail to connect when behind a proxy. You can try to disable your browser’s proxy settings.
You can also try to change the Jupyter port. On this Jupyter issue, the customer insists that his disconnection problem was gone after changing it. If you are using Chrome browser, could you please open the Inspect panel (Ctrl+Shift+I) and compare your connection symptoms with this image? If you get similar errors, you may try to change the port (c.NotebookApp.port).
I am running 2 instances under Google AI Platform, which basically launches 2 VM instances to run jupyter lab. I have been happily making notebooks on both VMs. I shutdown both VMs for the day...
What's strange is that next morning, notebook from one VM will launch but when I run any cell containing simple things like "import pandas", it never return result and hang the whole thing (with a * where the cell # would have generated). I create a whole new notebook and just do a simple print("hello"). it also never returns. I restarted the instance a few times and still doesn't work. What I noticed is the "dot" on the top right corner is filled black. I think it should be white when the kernel is restarted. So there could be a problem with the kernel.
Any ideas what could go wrong? I don't even know where to debug this. The strange thing is the other VM still worked. I don't want to do anything drastic like re-creating a new VM, since I like to be able to fix this for a known cause.
Anyone out there experienced same thing?
In case you didn't attempt this, I would try refreshing the notebook window after restarting the machine.
I keep jupyter server running on GCP VM instance by tmux.
But the problem is that I wanna keep fitting my model after leaving jupyter server session from my local laptop
(eg. I turn off my laptop but jupyter session is still alive, fitting model, and I am able to re-connect that session to check status).
The only way I came up with is to use ~.py and execute $python3 fitting.py, but I wanna run and fit model on jupyter notebook to monitor avoiding adding extra code.
If there is a possible way to do so, please kindly teach me.
Thanks!
Have you considered using the Fairing library? It comes pre-installed with GCP's new AI Platform Notebooks.
This library allows you to pack up your notebook and send it off for remote execution. A new notebook will the executed content will be saved to your GCP Storage bucket. No active internet connection required once you kick of the notebook run.
You can learn how to use it by creating a new GCP AI Platform Notebook and looking at the tutorials folder inside it. You can also find additional tutorials for Fairing here