Machine Learning (NLP) on AWS. Cloud9? SageMaker? EC2-AMI? - amazon-web-services

I have finally arrived in the cloud to put my NLP work to the next level, but I am a bit overwhelmed with all the possibilities I have. So I am coming to you for advice.
Currently I see three possibilities:
SageMaker
Jupyter Notebooks are great
It's quick and simple
saves a lot of time spent on managing everything, you can very easily get the model into production
costs more
no version control
Cloud9
EC2(-AMI)
Well, that's where I am for now. I really like SageMaker, although I don't like the lack of version control (at least I haven't found anything for now).
Cloud9 seems just to be an IDE to an EC2 instance.. I haven't found any comparisons of Cloud9 vs SageMaker for Machine Learning. Maybe because Cloud9 is not advertised as an ML solution. But it seems to be an option.
What is your take on that question? What have I missed? What would you advise me to go for? What is your workflow and why?

I am looking for an easy work environment where I can quickly test my models, exactly. And it won't be only me working on it, it's a team effort.
Since you are working as a team I would recommend to use sagemaker with custom docker images. That way you have complete freedom over your algorithm. The docker images are stored in ecr. Here you can upload many versions of the same image and tag them to keep control of the different versions(which you build from a git repo).
Sagemaker also gives the execution role to inside the docker image. So you still have full access to other aws resources (if the execution role has the right permissions)
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb
In my opinion this is a good example to start because it shows how sagemaker is interacting with your image.
Some notes on other solutions:
The problem of every other solution you posted is you want to build and execute on the same machine. Sure you can do this but keep in mind, that gpu instances are expensive and therefore you might only switch to the cloud when the code is ready to run.
Some other notes
Jupyter Notebooks in general are not made for collaborative programming. I think they want to change this with jupyter lab but this is still in development and sagemaker only use the notebook at the moment.
EC2 is cheaper as sagemaker but you have to do more work. Especially if you want to run your model as docker images. Also with sagemaker you can easily build an endpoint for model inference which would be even more complex to realize with ec2.
Cloud 9 I never used this service and but on first glance it seems good to develop on, but the question remains if you want to do this on a gpu machine. Because you're using ec2 as instance you have the same advantage/disadvantage.

One thing I'd like to call out first is SageMaker notebook is not the only IDE environment in which you can interact with other components of SageMaker such as training and hosting. In fact you can make API calls to SageMaker training/hosting through Cloud9 or any IDEs you've installed on EC2 or even your laptop, as long as you have AWS SDK or SageMaker Python SDK installed.
Regarding the choice of the IDE, it's really up to your particular needs. SageMaker notebook is Jupyter based (now also supports JupyterLab beta), ML focused, and fully managed. Hundreds of Python packages that are commonly used in ML, as well as Tensorflow, Keras, MxNet, SageMaker Python SDK, etc., are preinstalled and automatically maintained for you. It also integrates more closely with other components of SageMaker as one can imagine.
Cloud9 is a managed IDE too but it is for general purpose rather than ML specific. If you want to use Jupyter on cloud9 it requires extra work from your side. It does not preinstall and maintain the version of common ML/DL related packages like SageMaker notebook does.

Related

Is it possible to use Sagemaker Notebooks with a Docker image as your environment?

I'm currently developing a system that some private libraries. I'm developing in local mode and then when I need to process something specific I use Sagemaker Processing Jobs. The thing is that in order to speed up the process it would be nice to have the possibility of developing everything in a cloud environment.
I'm wondering if is possible to use the same Docker image that I use
for batch processing (the one that I use for Sagemaker Processing Job)
in my Sagemaker Jupyter Notebooks of my cloud environment?
The main problem here is that every time that I work in my cloud Notebooks I have to deal with dependencies conflicts and etc. Using a Docker image would avoid this, and will also allow to each member of the team use the same image to develop in the cloud without having to deal with these kind of conflicts.
You can use the same Docker image to run a processing job locally using SageMaker local mode (basically setting the instance_type parameter on the Processor to local.
However, it sounds like you'd want to use the same image as your dev environment in notebooks. In SageMaker notebook instances, the solution would be to create and maintain conda environments with the same requirements and versions (you can also use LCCs to install a set of packages at notebook start, see some samples here).
An alternative is to use SageMaker Studio, where you can create and bring your own custom image for Studio. There is a detailed tutorial here, and some sample dockerfiles for you to get started here.

Best Devops solution on GCP

I am quite new to GCP. My requirement is to implement devops solution on GCP. We are going to use python scripts and bigqueries.
I want to know which is the best cost effective devops solution to implement in GCP?
The built in CI/CD solution on Google Cloud is Cloud Build. I like this tool and I strongly recommend it. In summary, you have to define the steps to your build. Each steps are based on container. Load it, use it, go to the next one. Only the /workspace directory is kept between step (which creates some challenge sometime). You can redefine your entrypoint, your env vars for a step,... There is a lot of capabilities and there is a lot of help/tips on Stack Overflow or elsewhere.
For the pricing, it's interesting: you have 120 minutes of build free per day and PER BILLING ACCOUNT.
I'm not a Jenkins expert, I used it 6 years ago!
The main difference is the GUI and Plugins. You can do all with the GUI with jenkins, with Cloud Build, only the trigger and the jobs running/terminated (+ logs) are viewable on the console. The steps' configurations are only done by code (YAML or JSON file). Plugin are custom workers, but you haven't the same library as Jenkins.
On the other hand, Jenkins need to be hosted on VM, to be upgraded, the VM to be patched. And you have a minimum fee for Jenkins even if you have any builds.
Opinionated answer are difficult, because it depends on many factors!!

Data Science/Engineering (Dev/Prod) Environment

I am going to create environments. For now i have gcp machine and i run jupyter in there. Everytime, i need start it, and with 3 people it is hard to work in same environment. I know, there is docker, jupyter hub, but did not find and suitable roadmap to create dev/prod environment.
My aim to create dev and production environment. Everything should be on GCP.
Any suggested path ?
Thanks
You can take a look at the best practices for enterprise organizations. In order to properly split resources it's often advised to use different projects. However, depending on the GCP product, you could also use versions, such as with App Engine (see this StackOverflow thread).

implement a tool that uses the technologies - Jenkins, Docker, Docker Swarm, and AWS

I want to implement a tool that uses the technologies - Jenkins, Docker, Docker Swarm, and AWS - to achieve a deployment tool that our team of developers can use to manage instances and deploys.
Please recommend what technologies should we (both administrators and developers) be using, what needs to be built and what sorts of machines must be having.
Any help here would be much appreciated.
Your question is too generic to provide a specific answer, as there are different approaches to implement what you are trying to achieve. IMHO the best approach would be to talk with your existing dev team & administrators and come up with a solution which all parties find easy to manage and maintain container based environment rather than specifying several specific technologies.
Each tool you have mentioned has different capabilities and also there are other tools that provide the same features which would be more ideal for your situation. (Thats why proper understanding between Devs and admins are necessary on what you really want to achieve.) .
Since you have asked about what kind of machines you must be having (I suppose this is on AWS env) try Core OS on AWS instances. CoreOS (Container Linux) will be the best option to manage and run your container based environments. [About CoreOS]
Jenkins can run in a docker container and issue docker commands to deploy new docker containers that reside in the same swarm as jenkins. You also need to hook into a software repo like git. Jenkins Blue Ocean is something you could look at for pipe-lining your dev->build->test->deploy->maintain pipes. Also, Travis-ci, github, JIRA, and Dockerhub are useful components to what you are trying to achieve.

AWS AMI export to local Ubuntu

I have got a running AMI Ubuntu instance with CI/CD pipeline with Jenkins, Tomcat, Selenium, Sonar, Nagios etc .I have used elastic ip directly in different configuration to get it working.
Is there any easy and direct way to export this image to my local Ubuntu apart from AWS export/import service.
For what it's worth I don't agree with the down votes. Cloud is fast becoming "infrastructure as code" and I place emphasis on the code aspect. As the differences between programming and infrastructure management begin to fade (see FaaS - functions as a service) there needs to be room for having that conversation in this forum IMHO.
That said, let's move on to your questions. AWS provides instance export functionality though this solution may not fit all use cases. Two links below illustrate how to perform that export. YMMV. Good luck!
http://docs.aws.amazon.com/cli/latest/reference/ec2/create-instance-export-task.html
http://docs.aws.amazon.com/vm-import/latest/userguide/vmexport.html