Cloud Foundry Basic Questions

Cloud Foundry Basic Questions - cloud-foundry

I am very new to Cloud Foundry/Bosh and have a set of basic questions.
1) Droplet vs Garden container: I understand that droplet contains source code + build pack and these droplets are executed in garden container. IMHO, containers are good to transport to other system. Why there is intermediate notion of droplets? Should container by them-self not create the droplets?
2) Diego cell: What is the role of Diego cell (I assume that its job is to only start/stop garden containers) ? Are Diego cell platform dependent (eg. particular cell can run only windows-garden or other can run linux-garden container) ? Do we need one cell per container?
3) In the description of Diego cell, I read "Each application VM has a Diego Cell that executes application start and stop actions locally, manages the VM’s containers, and reports app status and other data to the BBS and Loggregator."
What is application VM mentioned here? Does it mean container?
4) Lets assume, I use Bosh to create my cloud foundry instance. After some time, I need to scale my system to two VMs (due to increase in load). Do I need to create a new Manifest for second VM (As the earlier manifest will also deploy the entire CF on this VM also)?

A container is, roughly speaking, a root file system image together with some things like resource limits and metadata about what volumes to mount, what processes to run, etc.
Garden is an API for creating and running container specifications. Anyone can write a server that implements the Garden API, the core Cloud Foundry teams maintain garden-linux, garden-runC, and garden-windows implementations.
A droplet is a "built" artifact created from source code that is typically mounted or streamed into a Garden container and then run. There are times where you do not want a separate droplet, and want to have your root file system as well as all source code and/or built artifacts baked into a single image. However, often, you do want this separation between the droplet, which represents your code, and the root file system. One major benefit is that CVEs present in lower level dependencies that are common to most containers can be uniformly repaired across all tenants and all running applications on the Cloud Foundry platform without any developers having to re-push their code. E.g. if there is a new patch required for something like openssl, and your Cloud Foundry installation has thousands of developers and hundreds of thousands of running Garden containers, it would be much better if an operator could roll out the openssl patch to all containers with a single command.
The Diego cell is a VM that's part of the Cloud Foundry architecture. Cloud Foundry itself is a distributed system with different components responsible for different things. There is a component responsible for user authorization and authentication, there are components for aggregating logs from applications, there is a component responsible for providing the developer-facing API for creating, scaling, and managing applications, etc. The Diego cell's are responsible for essentially taking requests to run containerized workloads, and running them. User requests to run an application are consumed by the user-facing API, and translated to a request to the Diego backend. Diego itself has several components, including a scheduler, and the scheduler's job is to select which cell to do a given piece of work.
You can think of the cell has having two components: (1) a Garden server for running containers, and (2) a representative that can represent that Garden server to the Diego scheduler, so rather than Garden having any Diego-specific knowledge (Garden can function in a standalone manner), the scheduler instead talks to each Garden's Diego representative on that same cell.
I'm not sure what "application VM" means in the quote you pulled out. Each application running on Cloud Foundry can be run with multiple parallel instances (for fault tolerance, better concurrency, etc.). Each application instance is running as a Garden container in some Diego cell. A production deployment of Cloud Foundry will have many Diego cells. Each cell can run many (up to hundreds) of Garden containers. For better fault tolerance, the Diego scheduler will attempt to place the instances of a given application on different Diego cells, rather than cramming them all into the same cell, since if that single cell goes down, your whole application goes down.
You do not need to create a new manifest to scale up BOSH deployments. Just change the instances value of whatever job/instance group you want to have more VMs of, and re-run bosh deploy.

1) The droplet is a container image that is stored persistently when you upload your application with a cf push. This image will be re-used whenever the system creates a new container with the same image (for example, when you restart your application, or scale to multiple instances). Without the droplet, you would have to re-upload your application and create the container image every time you wanted a new instance.
2) The Diego cell runs in a large VM that hosts many containers. A typical Diego cell might be 32G in size, while a typical container might be 1G. Diego cells currently only run Linux-Garden containers.
3) The application VM is just the VM that is hosting the Diego cell. I found the sentence a bit confusing because I tend to use the term Diego cell to refer both to the cell software and to the "application VM" which is hosting it.
4) Bosh will use multiple VMs to deploy Cloud Foundry. Single-VM deployments do exist (see for example, http://pivotal.io/pcf-dev) but they are not deployed with Bosh.

Related

How to manage multiple Environments within one project (GCP/AWS)

I am building a lab utility to deploy my teams development environments (testing / stress etc).
At present, the pipeline is as follows:
Trigger pipeline via HTTP request, args contain the distribution, web server and web server version using ARGs that are passed too multi stage dockerfiles.
Dockerx builds the container (if it doesn't exist in ECR)
Pipeline job pushes that container to ECR (if it doesn't already exist).
Terraform deploys the container using Fargate, sets up VPCs and a ALB to handle ingress externally.
FQDN / TLS is then provisioned on ...com
Previously when I've made tools like this that create environments, environments were managed and deleted solely at project level, given each environment had it's own project, given this is best practice for isolation and billing tracking purposes, however given the organisation security constraints of my company, I am limited to only 1 project wherein I can create all the resources.
This means I have to find a way to manage/deploy 30 (the max) environments in one project without it being a bit of a clustered duck.
More or less, I am looking for a way that allows me to keep track, and tear down environments (autonomously) and their associated resources relevant to a certain identifier, most likely these environments can be separated by resource tags/groups.
It appears the CDKTF/Pulumi looks like a neat way of achieving some form of "high-level" structure, but I am struggling to find ways to use them to do what I want. If anyone can recommend an approach, it'd be appreciated.
I have not tried anything yet, mainly because this is something that requires planning before I start work on it (don't like reaching deadends ha).

Creating a duplicate of a VM

I'm preparing to get in to the world of cloud computing.
My first question is:
Is it possible to programmatically create a new, or duplicate an existing VM from my server?
Project Background
I provide a file processing service, and as it's been growing I need to offer a better service.
Project Requirement
Machine specs:
HDD: Min 16gb
CPU: Min 1 core
RAM: Min 2
GB GPU: Min CUDA 10.1 compatible
What I'm thinking is the following steps:
User uploads a file
A dedicated VM is created for that specific file inside Google Cloud Compute
The file is sent to the VM
File is processed using a Anaconda environment
Results are downloaded to local server
Dedicated VM is removed
Results are served to user
How is this accomplished?
PS: I'm looking for resources and advice. Not code.

Your question is a perfect formulation of the concept of Google Cloud Run. At the highest level concept, you create a Docker image (think of it like a VM) and then register that Docker image with GCP Cloud Run. When a trigger occurs, GCP will spin up an instance of that Docker container and pass in information about the cause of that trigger (a file created in GCS or a REST request or others ...). What you do in your container is up to you. You have full power of the Linux environment (under Docker) to do as you like. When your request ends, the container is spun down. You are only billed for the compute resources you use. If your container (VM) isn't being used, you pay nothing until the next trigger.
An alternative to Cloud Run is Cloud Functions. This is a higher level abstraction where instead of providing a Docker container, you provide the body of a function (JavaScript, Java, Python or others) and the request is passed to that function when a trigger occurs. Which you use is mostly personal choice (you didn't elaborate on "File is processed").
References:
Cloud Run
Cloud Functions

Pivotal Cloud Foundry - Cell and Garden

When PCF is installed on IAAS , does it get installed on multiple
VMs which are denoted as cells by PCF?
Will each cell contain a garden implementation?
Will all cells all the different implementations of garden (windows , linux and docker)?
Can a single cell have both windows and linux based apps running?

Some of these questions are not entirely clear, but I'll try my best to answer them.
When PCF is installed on IAAS , does it get installed on multiple VMs which are denoted as cells by PCF?
Yes, Cloud Foundry comprises multiple VMs. The VMs are deployed and managed by Bosh (or Ops Manager & Bosh, if you're using Pivotal Cloud Foundry).
This is not an exhaustive list, but you'll see VMs for jobs like the Cloud Controller, UAA, Doppler, Traffic Controller and, of course, your Diego Cells.
The Diego Cells are where your applications run though, so you will typically have more Cells than any other VM type.
Will each cell contain a garden implementation?
Yes. Garden Linux (called Guardian) on your Linux Cells & Garden Windows for Windows Cells.
Will all cells all the different implementations of garden (windows , linux and docker)?
No. Linux Cells run Linux based apps (most of the build packs and Docker) and Windows Cells run Windows apps (HWC build pack).
Can a single cell have both windows and linux based apps running?
No, unless you want to count the fact that you can run .NET Core apps on Linux. That's a little different though.
If you want to deploy both Linux & Windows apps, you'll need to have at least two Cells. One for Linux & one for Windows.
Hope that helps!

#punter-vicky - Initially, if you run cf stacks you will see output like:
name description
cflinuxfs2 Cloud Foundry Linux-based filesystem
The Using PCF Runtime for Windows portion of Pivotal's documentation gives a complete overview of how to install and use Windows cells.
Once you have both types of cells available, the first priority Diego considers in granting a winning auction bid is whether the cell offers the correct stack for the application being bid on.

how to migrate virtual machine scale set in windows azure (asp.net)

I'm working on web app and i want to migrate this web app to virtual machine scale set in windows azure cloud,i'm new to cloud computing ,till i didn't got any proper tutorial about virtual machine scale set,please someone help with this

A few things to consider..
You could build a custom VM which contains the complete app, or you could use VM extensions to deploy the app on a platform image each time a new VM in the scale set is deployed. See: https://msftstack.wordpress.com/2016/04/20/deploying-applications-in-azure-vm-scale-sets/ for some thoughts on this. Ultimately it might depend on how much you need to install over a base image, and how fast you want scaling to be.
Do you need autoscale based on resource usage or do you plan to manually increase/decrease the number of VMs in the set? See https://azure.microsoft.com/en-us/documentation/articles/virtual-machine-scale-sets-windows-autoscale/
A good way to get started with scale sets is to deploy an existing template directly from Azure Quick start templates. Look at https://github.com/Azure/azure-quickstart-templates and search for vmss. These templates will give you an idea of some of the options you have.
To learn the basics about VM Scale Sets, start with the documentation page: https://azure.microsoft.com/documentation/services/virtual-machine-scale-sets/ and the GA announcement: https://azure.microsoft.com/en-us/blog/azure-virtual-machine-scale-sets-ga/
Also look at higher level services like the Azure Web App service if you haven't already, the advantage of a higher level service is that some of the basic web app operations get taken care of for you: https://azure.microsoft.com/en-us/services/app-service/web/

Adding files for application to use on Cloud Foundry

For an application I'm converting to the Cloud Foundry platform, I have a couple of template files. These are basically templates for documents that will be converted to PDF's. What are my options when it comes to having these available to my application? There are no persistent system drives, so I can't just upload them, it seems. Cloud Foundry suggests for you to save them on something like Amazon S3, Dropbox or Box, or simply having them in a database as blobs, but this seems like a very curious work-around.
These templates will change separately from application files, so I'm not intending to have them in the application Jar.

Cloud Foundry suggests for you to save them on something like Amazon S3, Dropbox or Box, or simply having them in a database as blobs, but this seems like a very curious work-around.
Why do you consider this a curious work-around?
One of the primary benefits of Cloud Foundry is elastic scalability. Once your app is running on CF, you can easily scale it up and down on demand. As you scale up, new copies of the app are started in fresh containers. As you scale down, app containers are destroyed. Only the files that are part of the original app push are put into a fresh container.
If you have files like these templates that are changing over time and you store them in the container's file system, you would need to make sure that all instances of the app have the same copies of the template file at all times as you scale up and down. When you upload new templates, you would have to make sure they get distributed to all instances, not just the one instance processing the upload. As new app instances are created on scale-up, you would need to make sure they have the latest versions of the templates.
Another benefit of CF is app health management. If an instance of your app crashes for any reason, CF will detect this and start a new instance in a fresh container. Again, only files that were part of the original app push will be dropped into the fresh container. You would need to make sure that the latest version of the template files got added to the container after startup.
Storing files like this that have a lifecycle separate from the app in a persistence store outside of the app container solves all these problems, and ensures that all instances of the app get the same versions of the files as you scale instances up and down or as crashed instances are resurrected.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js