I am looking to create a number of Deis clusters running in parallel on AWS and haven't been able to find any good documentation on how to do so. From what I understand I'd have to do the following:
When provisioning the cluster:
Create a new discovery URL
Give the stack a different name other than the standard "deis" when using the ./provision-aws-cluster.sh script
Create different Deis profiles in $HOME/.deis/client.json that map to each cluster
And when utilizing the deisctl and deis command line interfaces, I need to specify the DEISCTL_TUNNEL and the DEIS_PROFILE each time, respectively.
Am I missing anything? Will this impact my current Deis cluster if I install using the the changes listed above?
That is correct, I don't believe you are missing anything. You should save the cloud-config for each cluster (in contrib/coreos), that will have the discovery url in it and possibly other customizations depending on how your clusters will be configured. If the clusters are going to be different on the AWS side, make sure you save the cloudformation.json file for each as well.
I am setting up a CI/CD pipeline on Gitlab to deploy a full AWS EKS cluster using Terraform. I got that to work rather well, but now I want to be able to perform some tasks on the cluster from that pipeline. Specifically, I am following this guide from Gitlab on how to manually add one to Gitlab and am trying to put that in a script.
Now my issue is when I need to perform kubectl commands from within the pipeline, but I cannot figure out how to authenticate from there without actually creating a custom image containing the aws-iam-authenticator from AWS to authenticate. That honestly doesn't seem like the right way to do it, so I figure there has to be another way, a better way.
Maybe I am thinking totally wrong and I do not need to use kubectl and there is a totally different approach I can take. If that's true, please tell me so. If not, I'd love to know if there is a different way.
I have been researching the different ways of authenticating with a k8s cluster, but every single resource I have found that is tailored for EKS insists I need to use the aws-iam-authenticator program.
I'm writing a k8s operator, with the knowledge of current cloud provider the k8s is currently running on, I can do some platform-specific tasks for users, such as prepare some default storage classes for users.
but how can an operator running in the k8s cluster know it is GCP or AWS?
After scanning through the APIs, the cloud provider leaves some clues here and there, for example, for the GKE cluster I am running now, it has an API named: /apis/nodemanagement.gke.io/v1alpha1
but I think it's a little bit too hack, and wonder if there is any more formal way to get this info.
No, this is not exposed in a consistent way. You should have the use put it in their config file or whatever.
Indeed, it's not consistent. When the configuration is added by default to kubectl, you have these patterns:
> kubectl config current-context
# For GCP
> gke_gbl-imt-homerider-basguillaueb_europe-west1-b_my-first-cluster-1
# For AWS
> arn:aws:eks:eu-west-1:306974639454:cluster/demo-knative
You can also rename the config is you prefer your own pattern.
Unlike HortonWorks or Cloudera, AWS EMR does not seem to give any GUI to change xml configurations of various hadoop ecosystem frameworks.
Logging into my EMR namenode and doing a quick
find \ -iname yarn-site.xml
I was able to find it to be located at /etc/hadoop/conf.empty/yarn-site.xml and capacity-scheduler to be located at /etc/hadoop/conf.empty/capacity-scheduler.xml.
But note how these are under conf.empty and I suspect these might not be the actual locations for yarn-site and capacity-scheduler xmls.
I understand that I can change these configurations while making a cluster but what I need to know is how to be able to change them without tearing apart the cluster.
I just want to play around scheduling properties and such and try out different schedulers to identify what might work will with my spark applications.
Thanks in advance!
Well, the yarn-site.xml and capacity-scheduler.xml are indeed under correct locations (/etc/hadoop/conf.empty/) and on running cluster , editing them on master node and restarting YARN RM Daemon will change the scheduler.
When spinning up a new cluster , you can use EMR Configurations API to change appropriate values. http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html
For example : Specify appropriate values in capacity-scheduler and yarn-site classifications on your Configuration for EMR to change those values in corresponding XML files.
Edit: Sep 4, 2019 :
With Amazon EMR version 5.21.0 and later, you can override cluster configurations and specify additional configuration classifications for each instance group in a running cluster. You do this by using the Amazon EMR console, the AWS Command Line Interface (AWS CLI), or the AWS SDK.
Please see
I have a basic django/postgres app running locally, based on the Docker Django docs. It uses docker compose to run the containers locally.
I'd like to run this app on Amazon Web Services (AWS), and to deploy it using the command line, not the AWS console.
My Attempt
When I tried this, I ended up with:
this yml config for ecs-cli
these notes on how I deployed from the command line.
Note: I was trying to fire up the Python dev server in a janky way, hoping that would work before I added nginx. The cluster (RDS+server) would come up, but then the instances would die right away.
Issues I Then Failed to Solve
I realized over the course of this:
the setup needs another container for a web server (nginx) to run on AWS (like this blog post, but the tutorial uses the AWS Console, which I wanted to avoid)
ecs-cli uses a different syntax for yml/json config than docker-compose, so you need to have some separate/similar code from your local docker.yml (and I'm not sure if my file above was correct)
So, what ecs-cli commands and config do I use to deploy the app, or am I going about things all wrong?
Feel free to say I'm doing it all wrong. I could also use Elastic Beanstalk - the tutorials on this don't seem to use docker/docker-compose, but seem easier overall (at least well documented).
I'd like to understand why any given approach is a good way to do this.
One alternative you may wish to consider in lieu of ECS, if you just want to get it up in the amazon cloud, is to make use of docker-machine using the amazonec2 driver.
When executing the docker-compose, just ensure the remote Amazon host machine is ACTIVE which can be viewed with a docker-machine ls
One item you will have to revisit with the Amazon Mmgt Console is to open the applicable PORTS such as Port 80 and any other ports exposed in the compose file. Once the security group is in place for the VPC, you should be able to simply refer to the VPC ID on subsequent executions bypassing any need to use the Mgmt console to add the ports. You may wish to bump up the instance size from the default t2.micro to match the t2.medium specified in your NOTES.
If ECS orchestration is needed, then a task definition will need to be created containing the container definitions you require as defined in your docker compose file. My recommendation would be to take advantage of the Mgmt console to construct the definition and then grab the accompanying JSON defintion which is made available and store in your source code repository for future executions on the command line where they can be referenced in registering task definitions, executing tasks and services within a given cluster.
We're currently running a Kubernetes 1.0 cluster on AWS in production, and we'd like to spin up a second cluster to test out 1.1. Based on the AWS helper functions, it looks like multiple clusters aren't supported, but I wanted to be sure. There's a doc that describes running multiple clusters, but it's fairly brief.
In general, we'd like to have a second cluster continuously running for testing purposes. It seems like this would be a fairly common need.
You should be able to run a second cluster by setting INSTANCE_PREFIX before running kube-up. That variable in turn sets CLUSTER_ID which should parameterize everything in the cluster/aws/* scripts.