I am quite new to GCP. My requirement is to implement devops solution on GCP. We are going to use python scripts and bigqueries.
I want to know which is the best cost effective devops solution to implement in GCP?
The built in CI/CD solution on Google Cloud is Cloud Build. I like this tool and I strongly recommend it. In summary, you have to define the steps to your build. Each steps are based on container. Load it, use it, go to the next one. Only the /workspace directory is kept between step (which creates some challenge sometime). You can redefine your entrypoint, your env vars for a step,... There is a lot of capabilities and there is a lot of help/tips on Stack Overflow or elsewhere.
For the pricing, it's interesting: you have 120 minutes of build free per day and PER BILLING ACCOUNT.
I'm not a Jenkins expert, I used it 6 years ago!
The main difference is the GUI and Plugins. You can do all with the GUI with jenkins, with Cloud Build, only the trigger and the jobs running/terminated (+ logs) are viewable on the console. The steps' configurations are only done by code (YAML or JSON file). Plugin are custom workers, but you haven't the same library as Jenkins.
On the other hand, Jenkins need to be hosted on VM, to be upgraded, the VM to be patched. And you have a minimum fee for Jenkins even if you have any builds.
Opinionated answer are difficult, because it depends on many factors!!
Related
Since now, I've used Cloud Build as a vanilla CICD for running terraform and for building the infrastructure (sometimes I've Docker containers to build, sometimes I've not).
Now that Cloud Workflows is available I was wondering if this could be a better tool for pipelining atomic steps execution, for easiness and better control (for ex. conditional executions, error handling and so on, centralized log pushing and so)
I think that everything of the aboves can be done in Cloud Build, but it's usually not trivial to do.
Is Workflows ok for that and, if not, which is the best use case of this new tool instead?
You can have similarities, if, for example, your Cloud Build only call APIs to run/deploy/configure stuff.
However, keep in mind 2 things:
Cloud Workflow can only call APIs and sleep. You can't build a container image (with Docker for example) with Workflow. it's not a runtime environment, just a stuff which call APIs
Cloud Build can be trigger on push, tag and pull request. You can't do that with Workflow.
So, yes, sometime you can ask yourselves if you can change one by the other, but personally, I think that you have to use the right product for the right job.
API call orchestration -> Workflow
CICD -> Cloud Build
I'm presently looking into GCP's Deployment Manager to deploy new projects, VMs and Cloud Storage buckets.
We need a web front end that authenticated users can connect to in order to deploy the required infrastructure, though I'm not sure what Dev Ops tools are recommended to work with this system. We have an instance of Jenkins and Octopus Deploy, though I see on Google's Configuration Management page (https://cloud.google.com/solutions/configuration-management) they suggest other tools like Ansible, Chef, Puppet and Saltstack.
I'm supposing that through one of these I can update something simple like a name variable in the config.yaml file and deploy a project.
Could I also ensure a chosen name for a project, VM or Cloud Storage bucket fits with a specific naming convention with one of these systems?
Which system do others use and why?
I use Deployment Manager, as all 3rd party tools are reliant upon the presence of GCP APIs, as well as trusting that those APIs are in line with the actual functionality of the underlying GCP tech.
GCP is decidedly behind the curve on API development, which means that even if you wanted to use TF or whatever, at some point you're going to be stuck inside the SDK, anyway. So that's why I went with Deployment Manager, as much as I wanted to have my whole infra/app deployment use other tools that I was more comfortable with.
To specifically answer your question about validating naming schema, what you would probably want to do is write a wrapper script that uses the gcloud deployment-manager subcommand. Do your validation in the wrapper script, then run the gcloud deployment-manager stuff.
Word of warning about Deployment Manager: it makes troubleshooting very difficult. Very often it will obscure the error that can help you actually establish the root cause of a problem. I can't tell you how many times somebody in my office has shouted "UGGH! Shut UP with your Error 400!" I hope that Google takes note from my pointed survey feedback and refactors DM to pass the original error through.
Anyway, hope this helps. GCP has come a long way, but they've still got work to do.
I have finally arrived in the cloud to put my NLP work to the next level, but I am a bit overwhelmed with all the possibilities I have. So I am coming to you for advice.
Currently I see three possibilities:
SageMaker
Jupyter Notebooks are great
It's quick and simple
saves a lot of time spent on managing everything, you can very easily get the model into production
costs more
no version control
Cloud9
EC2(-AMI)
Well, that's where I am for now. I really like SageMaker, although I don't like the lack of version control (at least I haven't found anything for now).
Cloud9 seems just to be an IDE to an EC2 instance.. I haven't found any comparisons of Cloud9 vs SageMaker for Machine Learning. Maybe because Cloud9 is not advertised as an ML solution. But it seems to be an option.
What is your take on that question? What have I missed? What would you advise me to go for? What is your workflow and why?
I am looking for an easy work environment where I can quickly test my models, exactly. And it won't be only me working on it, it's a team effort.
Since you are working as a team I would recommend to use sagemaker with custom docker images. That way you have complete freedom over your algorithm. The docker images are stored in ecr. Here you can upload many versions of the same image and tag them to keep control of the different versions(which you build from a git repo).
Sagemaker also gives the execution role to inside the docker image. So you still have full access to other aws resources (if the execution role has the right permissions)
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb
In my opinion this is a good example to start because it shows how sagemaker is interacting with your image.
Some notes on other solutions:
The problem of every other solution you posted is you want to build and execute on the same machine. Sure you can do this but keep in mind, that gpu instances are expensive and therefore you might only switch to the cloud when the code is ready to run.
Some other notes
Jupyter Notebooks in general are not made for collaborative programming. I think they want to change this with jupyter lab but this is still in development and sagemaker only use the notebook at the moment.
EC2 is cheaper as sagemaker but you have to do more work. Especially if you want to run your model as docker images. Also with sagemaker you can easily build an endpoint for model inference which would be even more complex to realize with ec2.
Cloud 9 I never used this service and but on first glance it seems good to develop on, but the question remains if you want to do this on a gpu machine. Because you're using ec2 as instance you have the same advantage/disadvantage.
One thing I'd like to call out first is SageMaker notebook is not the only IDE environment in which you can interact with other components of SageMaker such as training and hosting. In fact you can make API calls to SageMaker training/hosting through Cloud9 or any IDEs you've installed on EC2 or even your laptop, as long as you have AWS SDK or SageMaker Python SDK installed.
Regarding the choice of the IDE, it's really up to your particular needs. SageMaker notebook is Jupyter based (now also supports JupyterLab beta), ML focused, and fully managed. Hundreds of Python packages that are commonly used in ML, as well as Tensorflow, Keras, MxNet, SageMaker Python SDK, etc., are preinstalled and automatically maintained for you. It also integrates more closely with other components of SageMaker as one can imagine.
Cloud9 is a managed IDE too but it is for general purpose rather than ML specific. If you want to use Jupyter on cloud9 it requires extra work from your side. It does not preinstall and maintain the version of common ML/DL related packages like SageMaker notebook does.
I'm redoing a badly built web application that my company uses in python/django (after deciding it was the best tool for the job).
I don't have much time to spend on development, which means I have even less time to get it deployed, and since its resource intensive and will be used by a lot of people concurrently, I'd like to be able to take advantage of all the tools that AWS offers, such as RDS, ElastiCache, CloudWatch, and especially any auto scaling tools.
I've seen Heroku and liked it, but I would prefer to use AWS, and the price seems quite high.
I don't mind getting my hands dirty as long as it doesn't take half the development time setting up deployment.
I'm looking for something we can use, whether it be a service or AMI so that we can deploy automatically from our repository, without spending days configuring it and figuring out how to get it working, and without drastically increasing the price to host our app.
As you want something quick and simple, maybe consider RightScale's ServerTemplates to get you up and running quickly. RightScale have a free developer account. There are a few Django ServerTemplates and they are all priced for "All Users", so they'll work with the free developer account.
That will get you a base application stack quickly.
Next, I'd look into using fabric (similar to capistrano) and/or github post-commit hooks to automate deployment of your application.
Once you're happy with that and have more time on your hands you could look at adding all the other stuff you want to use (ElastiCache, etc).
Heroku runs on AWS: http://devcenter.heroku.com/articles/external-services
So, you can use AWS services from Heroku as any EC2 instance. If really wanting it, set Heroku for hard-to-setup services and some little AWS EC2 instance for I-do-myself services.
To automate the deployment you can use a 3rd party tool like capistrano or http://nudow.com. Capistrano will do a lot of the deployment but you have to host it yourself and you have to do the deployment in a specific way for it to work correctly (such as using the same keys everywhere, etc). Nudow.com is easier to setup and is hosted. It will deploy to your existing infrastructure and will do stuff like versioning. Also it has a lot of tools to do things like minimizing javascript/css and uploading to cloudfront.
Quite a few build and CI systems support steps for pushing build output to Azure, but I haven't seen any which can actually run on Azure (or EC2). Ideally I would like to be able to spin up an arbitrary number of instances (depending on the # of pending submits) to deal with the actual build + quality gates (UTs, FXCop, other static analysis tools) + source repository checkin process.
Are there existing tools which can do this, or has anyone built something which they can discuss?
Thanks!
[Edit: I found this question which is quite similar but didn't have any informative answers, so I'll keep my question alive]
If you're using Git or Mercurial for source control, AppHarbor might be what you're looking for. It's a CI build/deploy environment that runs exclusively in the cloud (EC2), and can deploy build output to Azure.
Here are some links for reference:
http://sourcecodebean.com/archives/appharbor-heroku-for-net/987
http://lostechies.com/chrismissal/2011/03/12/using-appharbor-for-continuous-integration
http://haacked.com/archive/2011/05/12/making-let-me-bing-that-for-you-open-source.aspx
http://appharbor.com/page/pricing
The open souce Jenkins CI server has an EC2 plugin that will spin up EC2 instances automatically depending on your build load. I couldn't find anything for Azure, but I highly recommend Jenkins - it's easy to configure, well maintained and has stacks of features.
Continuous Integration on Windows Azure http://code.google.com/p/cassis/ (over Mercurial)
Disclaimer: work produced by my 1st year CS students
Also Teamcity has support for this: http://www.jetbrains.com/teamcity/features/amazon_ec2.html