I am still new in AWS sagemaker. Working on a architecture where we would have an AWS sagemaker notebook. There would be multiple users, I want that students don`t see each other work. would I need to do that in terminal? or we can do that in notebook itself?
The simplest way is to create a small notebook instance for each student. This way you can have the needed isolation and also the responsibility of each student for their notebook to stop them when they are not in use.
The smallest instance type costs $0.0464 per hour. If you have it running 24/7 it costs about $30 per month. But if the students are responsible and stop their instances when they are not using them, it can be about $1 for 20 hours of work.
If you want to enable isolation to the notebooks, you can use the ability to presign the URL that is used to open the Jupyter interface. See here on the way to use the CLI to create the URL: https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-presigned-notebook-instance-url.html. It is also supported in other SDK.
create-presigned-notebook-instance-url
--notebook-instance-name <student-instance-name>
--session-expiration-duration-in-seconds 3600
You can integrate it into the internal portal that you have in your institute.
Related
I have a Windows EC2 instance in place. I cannot delete it every day since we have multiple tools installed like accessing Postgress RDS via Dbeaver. Now, we have an activity of deleting a few S3 folders. So using the Mobaxterm tool, I can delete it via AWS CLI commands.
However, I am unable to schedule this script which runs once daily in the morning. I explored a few posts which are not relevant to my problem. There, the user is trying to launch > run script > delete instance which I don't want to do.
What can be done in my case?
At least two options come to mind:
Use Windows task scheduler to create a task that will run your script daily directly on the instance
Use AWS Systems Manager State Manager to run a custom document that will execute your script remotely on a daily basis
I would recommend the second option because you would be able to reuse it for other instances if needed.
I have a GCP AI notebook instance. Anyone with admin access to notebook in my project can open notebook and read, modify or delete any folder/file which is created by me or any other user in my team. Is there a way to create a private repository like /home/user, like we could've done if we used JupyterHub installed on a VM?
Implementing your requirement is not feasible with AI Notebooks. AI Notebooks is intended for a rapid prototyping and development environment that can be easily managed, and advanced multi-user scenarios fall outside its intended purpose.
The Python Kernel in AI Notebooks always runs under the Linux user "Jupyter" regardless of what GCP user accesses the notebook. Anyone who has editor permissions to your Google Cloud project can see each other's work through the Jupyter UI.
In order to isolate the user's work, the recommended option is to set up individual notebook instances for each user. Please find the 'Single User' option.
It’s not feasible to combine multiple instances to a master instance in AI Notebooks. So, the recommended ways to give each user a Notebook instance and share any source code via GIT or other repository system. Please find Save a Notebook to GitHub doc for more information.
You probably created a Notebook using Service Account mode. You can provide access to single users only via single-user mode
Example:
proxy-mode=mail,proxy-user-mail=user#domain.com
I made a classifier in Python that uses a lot of libraries. I have uploaded the model to Amazon S3 as a pickle (my_model.pkl). Ideally, every time someone uploads a file to a specific S3 bucket, it should trigger an AWS Lambda that would load the classifier, return predictions and save a few files on an Amazon S3 bucket.
I want to know if it is possible to use a Lambda to execute a Jupyter Notebook in AWS SageMaker. This way I would not have to worry about the dependencies and would generally make the classification more straight forward.
So, is there a way to use an AWS Lambda to execute a Jupyter Notebook?
Scheduling notebook execution is a bit of a SageMaker anti-pattern, because (1) you would need to manage data I/O (training set, trained model) yourself, (2) you would need to manage metadata tracking yourself, (3) you cannot run on distributed hardware and (4) you cannot use Spot. Instead, it is recommended for scheduled task to leverage the various SageMaker long-running, background job APIs: SageMaker Training, SageMaker Processing or SageMaker Batch Transform (in the case of a batch inference).
That being said, if you still want to schedule a notebook to run, you can do it in a variety of ways:
in the SageMaker CICD Reinvent 2018 Video, Notebooks are launched as Cloudformation templates, and their execution is automated via a SageMaker lifecycle configuration.
AWS released this blog post to document how to launch Notebooks from within Processing jobs
But again, my recommendation for scheduled tasks would be to remove them from Jupyter, turn them into scripts and run them in SageMaker Training
No matter your choices, all those tasks can be launched as API calls from within a Lambda function, as long as the function role has appropriate permissions
I agree with Olivier. Using Sagemaker for Notebook execution might not be the right tool for the job.
Papermill is the framework to run Jupyter Notebooks in this fashion.
You can consider trying this. This allows you to deploy your Jupyter Notebook directly as serverless cloud function and uses Papermill behind the scene.
Disclaimer: I work for Clouderizer.
It totally possible, not an anti-pattern at all. It really depends on your use-case. AWs actually made a great article describing it, which includes a lambda
I know there is a good tutorial on how to create jupyter notebooks on AWS sagemaker "the easy way".
Do you know if it is possible to allow 10 students to create jupyter-notebooks who do not have an AWS accounts, and also allow them to edit jupyter-notebooks?
Enabling multiple users to leverage the same notebook (in this case, without authentication) will involve managing your Security Groups to enable open access. You can filter, allowing access for a known IP address range, if your students are accessing it from a classroom or campus, for example.
Tips for this are available in this answer and this page from the documentation, diving into network configurations for SageMaker hosted notebook instances.
As for enabling students to spin up their own notebooks, I'm not sure if it's possible to enable completely unauthenticated AWS-level resource provisioning -- however once you've spun up a single managed notebook instance yourself, students can create their own notebooks directly from the browser in Jupyter, once they've navigated to the publicly available IP. You may need to attach a new SageMaker IAM role that enables notebook creation (amongst other things, depending on the workload requirements). Depending on the computational needs (number, duration, and types of concurrent workloads), there will be different optimal setups of number of managed instances and instance type to prevent computational bottlenecking.
We're trying to figure out the best way to deploy to an auto-scaling AWS setup using Capistrano, and stuck on the best way to ensure new servers automatically get the latest code, without having to rely on AMIs.
Any ideas?
Using User Data, you can have your EC2 instances pull the latest code each time a new instance is launched.
More info on user data here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
tldr: user data is pretty much a shell script thats executed when your ec2 instance launches. you can get it to pull the latest code and run it
#Moe's answer (or something like it is the right one). But just as another thought, you could write some Ruby which queries AWS on deploy to fetch the list of servers to which Capistrano will deploy. The issue with this approach is that you will have to manually deploy to all servers every time auto-scaling adds a server, which kind of defeats the purpose.