Ipython notebook remote server on AWS - amazon-web-services

I'm running a remote IPython notebook server on an EC2 instance on AWS. The instance is running Ubuntu.
Followed this tutorial to set up, and everything seems to work - I can access the notebook via https with a password and run code.
However, I can't seem to save changes to the notebook - It says "saving notebook" and then nothing happens (i.e, still written 'unsaved changes' on top).
Any ideas would be greatly appreciated.
Edit: It's not a permissions problem, since running in sudo doesn't help.
When creating a new notebook in the remote server, I am able to save. Problem only occurs for notebooks pulled from my git repository. Also, when opening a problematic notebook, and deleting all cells until it's absolutely empty, I can sometimes (!) save the empty notebook, and sometimes (!!) I still can't.

I've encountered an issue where notebooks wouldn't save on the nbserver on AWS EC2 instance I set up in a similar manner via different tutorial. It turns out I had to refresh and re-login using the password, because my browser would automatically log out have a certain period. Might help if you close and re-attempt to go the the nbserver and see if it asks you to re-login.
Here's a few other things you can try:
try to copy a problematic notebook into the server (scp) and try to open+save, as opposed to going thru repo pull to see if anything changes
check if the hanging "saving notebook" message appear for notebooks in certain directories
check the ipython console messages when you save a problematic notebook and see if anything there can help you pinpoint the issue

Related

AWS Glue Development Endpoint Not Working properly

I am trying to use a development Endpoint to interactively run and edit ETL scripts but there seems to some issues in the development endpoint just after creating it as i am getting errors in scala/python REPL and also unable to do SSH tunnel to remote interpreter.
Let me explain what i did exactly - I created a development endpoint in the AWS console with all the default configurations. While creating the development endpoint i only provided three things 'Development endpoint name' and 'IAM Role' and my 'pub ssh key'. This is how it looks after creation
Then Right After creating the endpoint i am connecting to the spark/python REPL, I am able to connect to them successfully but within couple of minutes of connecting, the REPL starts throwing errors without writing a single line of code. This is happening in all the REPL present in the development endpoints.
Also When I am trying to do SSH tunneling to remote interpreter to connect my Local Zeppelin Notebook it is throwing - "bind: Cannot assign requested address".
Couple of things that are working though -
Able to do ssh to the endpoint.
Created a Sagemaker notebook in the AWS glue that is attached to this development endpoint and this notebook seems to be working fine, although surely it is adding an additional cost and i don't want to continue using it.
Can anyone please help what am i doing wrong? Am I missing any important steps that is needed to be done on the machine right after creating the development endpoint?
Thanks in Advance!
Not very sure about this error but if you are using it smaller datasets then probably you would like to use Docker implementation as it will not add any additional cost and you can go on with your developments.
You can refer this blog on how to set it up
https://towardsdatascience.com/develop-glue-jobs-locally-using-docker-containers-bffc9d95bd1

Google Cloud VM Files Deleted after session disconnect

I am having some of my GCP instances behave in a way similar to what is described in the below link:
Google Cloud VM Files Deleted after Restart
The session gets disconnected after a small duration of inactivity at times. On reconnecting, the machine is as if it is freshly installed. (Not on restarts as in the above link). All the files are gone.
As you can see in the attachment, it is creating the profile directory fresh when the session is reconnected. Also, none of the installations I have made are there. Everything is lost including the root installations. Fortunately, I have been logging all my commands and file set ups manually on my client. So, nothing is lost, but I would like to know what is happening and resolve this for good.
This has now happened a few times.
A point to note is that if I get a clean exit, like if I properly logout or exit from the ssh, I get the machine back as I have left, when I reconnect. The issue is there only when the session disconnects itself. There have been instances where the session disconnected and I was able to connect back as well.
The issue is not there on all my VMs.
From the suggestions from the link I have posted above:
I am not connected to the cloud shell. i am taking ssh of the machine using the chrome extension
Have not manually mounted any disks (afaik)
I have checked the logs from gcloud compute instances get-serial-port-output --zone us-east4-c INSTANCE_NAME. I could not really make much of it. Is there anything I should look for specifically?
Any help is appreciated.
Please find the links to the logs as suggested by #W_B
Below is from 8th when the machine was restarted and files deleted
https://pastebin.com/NN5dvQMK
It happened again today. I didn't run the command immediately then. The below file is from afterwards though
https://pastebin.com/m5cgdLF6
The below one is after logout today.
[4]: https://pastebin.com/143NPatF
Please note that I have replaced the user id, system name and a lot of numeric values in general using regexp. So, there is a slight chance that the time and other values have changed. Not sure if that would be a problem.
I have added the screenshot of the current config from the UI
Using locally attached SDD seems to be the cause ... here it is explained:
https://cloud.google.com/compute/docs/disks/local-ssd#data_persistence
You need to use a "persistent disk" - else it will behave just as you describe it.

Error when trying to connect to a Cloud SQL instance using the Cloud Shell

I've had a Cloud SQL instance for about a year now.
I always accessed it the same way:
I would go to my project on the Cloud Console.
Click on the Cloud Shell icon at the top right (a small right pointing arrow).
A black shell screen would pop up where I would type
gcloud sql connect <my instance> --user=root.
Enter my password.
Now, all of a sudden, I am getting an error message saying:
There was no instance found at projects//instances/ or you are not authorized to connect to it.
I am the owner of the project, and also have Admin rights to the Cloud SQL instance. The project and instance are still there, and my app that accesses the data stored in the instances' database is working fine - therefore I know the database is also present, otherwise my app wouldn't work.
I didn't touch or change anything in the Cloud SQL instance. Suddenly, I simply can't access my database using the exact same procedure I have been using almost every day over the past year now.
I am able to access the database using a local Python script on my laptop and the Cloud SQL Proxy, but I would like to access it from the Cloud Shell again.
Any ideas on what could the problem be?
gcloud components update - update all of your installed components to the latest version
gcloud init - reinitialize gcloud shell. It performs the following setup steps:
Authorizes gcloud and other SDK tools to access Google Cloud Platform using your user account credentials, or from an account of your choosing whose credentials are already available.
It seems like there was a problem with the GCP Cloud Shell (even though there was no mention of it on the GCP error tracking page). When I logged back in today and followed the same above process everything worked well.
Looks like GCP Cloud Shell could occasionally go rouge and start producing errors. Word of advice, don't panic when this happens (like I did) and start resetting, rebooting and messing up things. Just wait a day and check back again.

Browser drops connection during model training

I am currently trying to go through a fairly long hyperparameter grid search (4-5 hours) and I keep having issues with Jupyter Lab (or haven't figured out something yet) on a gcp notebook instance. The browser connection to the notebook keeps dropping, whereas the training process continues just fine. When it finishes training process, there's nowhere to write the output as the browser connection to the notebook has already dropped.
How can I keep that connection alive or make sure the output gets written into the notebook even if my laptop gets turned off/gets turned off?
There are multiple problems that may be affecting your notebook. It can be a GCP issue, a network issue... Therefore, you need to provide more information in order to diagnose what is happening. I would recommend you to open a ticket with GCP or Jupyter support to conduct a more thorough investigation as it can be something difficult to diagnose and they will have more tools to do it. Also, what #Joaquim suggested seems like a good workaround for the moment. Anyhow, I have gathered several troubleshooting steps that you can follow to find if it is one of this recurrent issues the one that is affecting you:
According to this Jupyter Notebook document, there is a ‘shutdown_no_activity_timeout’ option. The default value is ‘0’ that disables this automatic shutdown. The option might be overridden on ‘jupyter_notebook_config.py’ file. You may follow these steps to confirm it:
Click on the instance name of in which your Notebook is running on the AI Platform Notebooks page.
Remote access it by clicking “SSH”
Run this on the shell to confirm the existence of the overriding:
ls /home/*/.jupyter/jupyter_notebook_config.py
Run this command to confirm if the shutdown_no_activity_timeout option is doing the overriding:
cat /home/*/.jupyter/jupyter_notebook_config.py | grep shutdown_no_activity_timeout
Switch the option to ‘0’ if it is set to a different value, and reset the Notebook instances on this page to apply the change.
According to this other document, it might fail to connect when behind a proxy. You can try to disable your browser’s proxy settings.
You can also try to change the Jupyter port. On this Jupyter issue, the customer insists that his disconnection problem was gone after changing it. If you are using Chrome browser, could you please open the Inspect panel (Ctrl+Shift+I) and compare your connection symptoms with this image? If you get similar errors, you may try to change the port (c.NotebookApp.port).

GCP) How to keep jupyter session connected after disconnecting jupyter session from my local laptop?

I keep jupyter server running on GCP VM instance by tmux.
But the problem is that I wanna keep fitting my model after leaving jupyter server session from my local laptop
(eg. I turn off my laptop but jupyter session is still alive, fitting model, and I am able to re-connect that session to check status).
The only way I came up with is to use ~.py and execute $python3 fitting.py, but I wanna run and fit model on jupyter notebook to monitor avoiding adding extra code.
If there is a possible way to do so, please kindly teach me.
Thanks!
Have you considered using the Fairing library? It comes pre-installed with GCP's new AI Platform Notebooks.
This library allows you to pack up your notebook and send it off for remote execution. A new notebook will the executed content will be saved to your GCP Storage bucket. No active internet connection required once you kick of the notebook run.
You can learn how to use it by creating a new GCP AI Platform Notebook and looking at the tutorials folder inside it. You can also find additional tutorials for Fairing here