multiuser public jupyter notebook on AWS sagemaker - amazon-web-services

I know there is a good tutorial on how to create jupyter notebooks on AWS sagemaker "the easy way".
Do you know if it is possible to allow 10 students to create jupyter-notebooks who do not have an AWS accounts, and also allow them to edit jupyter-notebooks?

Enabling multiple users to leverage the same notebook (in this case, without authentication) will involve managing your Security Groups to enable open access. You can filter, allowing access for a known IP address range, if your students are accessing it from a classroom or campus, for example.
Tips for this are available in this answer and this page from the documentation, diving into network configurations for SageMaker hosted notebook instances.
As for enabling students to spin up their own notebooks, I'm not sure if it's possible to enable completely unauthenticated AWS-level resource provisioning -- however once you've spun up a single managed notebook instance yourself, students can create their own notebooks directly from the browser in Jupyter, once they've navigated to the publicly available IP. You may need to attach a new SageMaker IAM role that enables notebook creation (amongst other things, depending on the workload requirements). Depending on the computational needs (number, duration, and types of concurrent workloads), there will be different optimal setups of number of managed instances and instance type to prevent computational bottlenecking.

Related

How to access-protect to data on my Google Compute Engine VM?

I want to work with sensitive data that should not be seen by other members on a specific VM instance in GCP that my organization has contracted.
Usually, if I just set up a VM instance, other members of your organization are free to create a user to connect to her VM with SSH and have sudo privileges.
So, I'm wondering if I shouldn't have sensitive data inside the VM.
Q. Is there a way to prevent other users from accessing my VM instance data?
Q. Is the OS login suitable for the above purposes?If there is a simpler and more typical method, I would like to adopt it.
I currently have the role of "editor" on the GCP project.
Thanks.

GCP AI notebook instance permission

I have a GCP AI notebook instance. Anyone with admin access to notebook in my project can open notebook and read, modify or delete any folder/file which is created by me or any other user in my team. Is there a way to create a private repository like /home/user, like we could've done if we used JupyterHub installed on a VM?
Implementing your requirement is not feasible with AI Notebooks. AI Notebooks is intended for a rapid prototyping and development environment that can be easily managed, and advanced multi-user scenarios fall outside its intended purpose.
The Python Kernel in AI Notebooks always runs under the Linux user "Jupyter" regardless of what GCP user accesses the notebook. Anyone who has editor permissions to your Google Cloud project can see each other's work through the Jupyter UI.
In order to isolate the user's work, the recommended option is to set up individual notebook instances for each user. Please find the 'Single User' option.
It’s not feasible to combine multiple instances to a master instance in AI Notebooks. So, the recommended ways to give each user a Notebook instance and share any source code via GIT or other repository system. Please find Save a Notebook to GitHub doc for more information.
You probably created a Notebook using Service Account mode. You can provide access to single users only via single-user mode
Example:
proxy-mode=mail,proxy-user-mail=user#domain.com

GCP machine images and credentials

I have a question regarding Google Cloud custom images and how/if credentials are stored. Namely, if I customize a VM and save the machine image with public access, am I possibly exposing credentials??
In particular, I'm working on a cloud-based application that relies on a "custom" image which has both gsutil and docker installed. Basic GCE VMs have gsutil pre-installed but do not have docker. On the other hand, the container-optimized OS have docker, but do not have gsutil. Hence, I'm just starting from a basic debian image and installing docker to get what I need.
Ideally, when I distribute my application, I would like to just expose that customized image for public use; this way, users will not have to spend extra effort to make their own images.
My concern, however, is that since I have used gsutil on the customized VM, persisting this disk to an image will inadvertently save some credentials related to my project (if so, where are they??). Hence, anyone using my image will also get those credentials.
I tried to reproduce your situation. I created a customer image from the disk of an instance who could access my project Storage buckets. Then, I shared the image for another user in a different project. The user could create an instance out of that shared image. However, when he tried to access my project buckets, he encountered AccessDeniedException error.
According to this reproduction and my investigations, your credentials are not exposed with the image. IAM grant permissions are based on roles given to a user, a group, or a service account. Sharing images can't grant them to others.
Furthermore, (as Patrick W mentioned below) any thing you run from within a GCE VM instance will use the VM's service account (unless otherwise specified). As long as the service account has access to the bucket, so will your applications (including docker containers.

Prevent an elastic beanstalk app being downloaded using the AWS account?

Short background, we're a small business but our clients are much larger businesses. We have some software they subscribe to which is deployed to AWS elastic beanstalk. Clients have their own devops teams, unlike us, and will need to manage some of the technical support. They will need access to the AWS account running the software, so they can do things like reboot the server, clear the database if they screw it up, change the EC2 instance type etc. This is OK but we want to prevent the software being downloaded outside of the AWS account.
The software is a java WAR running on Tomcat, on a single elastic beanstalk instance. We only care about limiting access to the WAR file (not the database for example).
The beanstalk application versions page appears to have no way to download the WAR file - which is good. They could SSH into the underlying EC2 instance though so presumably they could just copy the WAR out of the tomcat directory. Given the complexity of AWS there's probably other ways they could get access the WAR file too (e.g. clone the EBS volume and attach to another EC2 instance).
I assume that the machine instances available for purchase via AWS marketplace must have some form of copy protection but I've not been able to find any details on this. Also it looks like AWS only accepts marketplace vendors who are much larger than us, so marketplace option may not be open to us.
Any idea how I could prevent access to the WAR file running on elastic beanstalk while still allowing the client access to the AWS account? (Or at least make access hard).
The only solution that comes to mind for this would be, removing any EC2 SSH Key Pairs from the account, and specifically denying them access to ec2:CreateKeyPair. Really, what you need to be doing is granting them least privilege access to the account, that is, specifically granting them access only to those actions they absolutely need.
This will go a long way, but with sufficient knowledge of AWS, it's going to be an uphill battle trying to ensure that you give them enough access to do what they need, while not giving them more than you want. I'd question if a legal option (like contracts, licenses, etc) would be a better protection for this.

Boot strapping AWS auto scale instances

We are discussing at a client how to boot strap auto scale AWS instances. Essentially, a instance comes up with hardly anything on it. It has a generic startup script that asks somewhere "what am I supposed to do next?"
I'm thinking we can use amazon tags, and have the instance itself ask AWS using awscli tool set to find out it's role. This could give puppet info, environment info (dev/stage/prod for example) and so on. This should be doable with just the DescribeTags privilege. I'm facing resistance however.
I am looking for suggestions on how a fresh AWS instance can find out about it's own purpose, whether from AWS or perhaps from a service broker of some sort.
EC2 instances offer a feature called User Data meant to solve this problem. User Data executes a shell script to perform provisioning functions on new instances. A typical pattern is to use the User Data to download or clone a configuration management source repository, such as Chef, Puppet, or Ansible, and run it locally on the box to perform more complete provisioning.
As #e-j-brennan states, it's also common to prebundle an AMI that has already been provisioned. This approach is faster since no provisioning needs to happen at boot time, but is perhaps less flexible since the instance isn't customized.
You may also be interested in instance metadata, which exposes some data such as network details and tags via a URL path accessible only to the instance itself.
An instance doesn't have to come up with 'hardly anything on it' though. You can/should build your own custom AMI (Amazon machine image), with any and all software you need to have running on it, and when you need to auto-scale an instance, you boot it from the AMI you previously created and saved.
http://docs.aws.amazon.com/gettingstarted/latest/wah-linux/getting-started-create-custom-ami.html
I would recommend to use AWS Beanstalk for creating specific instances, this makes it easier since it will create the AutoScaling groups and Launch Configurations (Bootup code) which you can edit later. Also you only pay for EC2 instances and you can manage most of the things from Beanstalk console.