nomad: Pull docker image from ECR with AWS Access and Secret keys - amazon-iam

My problem
I have successfully deployed a nomad job with a few dozen Redis Docker containers on AWS, using the default Redis image from Dockerhub.
I've slightly altered the default config file created by nomad init to change the number of running containers, and everything works as expected
The problem is that the actual image I would like to run is in ECR, which requires AWS permissions (access and secret key), and I don't know how to send these.
Code
job "example" {
datacenters = ["dc1"]
type = "service"
update {
max_parallel = 1
min_healthy_time = "10s"
healthy_deadline = "3m"
auto_revert = false
canary = 0
}
group "cache" {
count = 30
restart {
attempts = 10
interval = "5m"
delay = "25s"
mode = "delay"
}
ephemeral_disk {
size = 300
}
task "redis" {
driver = "docker"
config {
# My problem here
image = "https://-whatever-.dkr.ecr.us-east-1.amazonaws.com/-whatever-"
port_map {
db = 6379
}
}
resources {
network {
mbits = 10
port "db" {}
}
}
service {
name = "global-redis-check"
tags = ["global", "cache"]
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
}
What have I tried
Extensive Google Search
Reading the manual
Placing the aws credentials in the machine which runs the nomad file (using aws configure)
My question
How can nomad be configured to pull Docker containers from AWS ECR using the AWS credentials?

Pretty late for you, but aws ecr does not handle authentication in the way that docker expects. There you need to run sudo $(aws ecr get-login --no-include-email --region ${your region}) Running the returned command actually authenticates in a docker compliant way
Note that region is optional if aws cli is configured. Personally, I allocate an IAM role the box (allowing ecr pull/list/etc), so that I don't have to manually deal with credentials.

I don't use ECR, but if it acts like a normal docker registry, this is what I do for my registry, and it works. Assuming the previous sentence, it should work fine for you as well:
config {
image = "registry.service.consul:5000/MYDOCKERIMAGENAME:latest"
auth {
username = "MYMAGICUSER"
password = "MYMAGICPASSWORD"
}
}

Related

Django Terraform digitalOcean re-create environment in new droplet

I have saas based Django app, I want, when a customer asks me to use my software, then i will auto provision new droplet and auto-deploy the app there, and the info should be saved in my database, like ip, customer name, database info etc.
This is my terraform script and it is working very well coz, the database is now running on
terraform {
required_providers {
digitalocean = {
source = "digitalocean/digitalocean"
version = "~> 2.0"
}
}
}
provider "digitalocean" {
token = "dop_v1_60f33a1<MyToken>a363d033"
}
resource "digitalocean_droplet" "web" {
image = "ubuntu-18-04-x64"
name = "web-1"
region = "nyc3"
size = "s-1vcpu-1gb"
ssh_keys = ["93:<The SSH finger print>::01"]
connection {
host = self.ipv4_address
user = "root"
type = "ssh"
private_key = file("/home/py/.ssh/id_rsa") # it works
timeout = "2m"
}
provisioner "remote-exec" {
inline = [
"export PATH=$PATH:/usr/bin",
# install docker-compse
# install docker
# clone my github repo
"docker-compose up --build -d"
]
}
}
I want, when i run the commands, it should be create new droplet, new database instance and connect the database with my django .env file.
Everything should be auto created. Can anyone please help me how can I do it?
or my approach is wrong? in this situation, what would be the best solution?

cannot ssh into instance created from sourceImage "google_compute_instance_from_machine_image"

I am creating an instance from a sourceImage, using this terraform template:
resource "tls_private_key" "sandbox_ssh" {
algorithm = "RSA"
rsa_bits = 4096
}
output "tls_private_key_sandbox" { value = "${tls_private_key.sandbox_ssh.private_key_pem}" }
locals {
custom_data1 = <<CUSTOM_DATA
#!/bin/bash
CUSTOM_DATA
}
resource "google_compute_instance_from_machine_image" "sandboxvm_test_fromimg" {
project = "<proj>"
provider = google-beta
name = "sandboxvm-test-fromimg"
zone = "us-central1-a"
tags = ["test"]
source_machine_image = "projects/<proj>/global/machineImages/sandboxvm-test-img-1"
can_ip_forward = false
labels = {
owner = "test"
purpose = "test"
ami = "sandboxvm-test-img-1"
}
metadata = {
ssh-keys = "${var.sshuser}:${tls_private_key.sandbox_ssh.public_key_openssh}"
}
network_interface {
network = "default"
access_config {
// Include this section to give the VM an external ip address
}
}
metadata_startup_script = local.custom_data1
}
output "instance_ip_sandbox" {
value = google_compute_instance_from_machine_image.sandboxvm_test_fromimg.network_interface.0.access_config.0.nat_ip
}
output "user_name" {
value = var.sshuser
}
I can't even ping / netcat, neither the private or public IP of the VM created. Even the "serial port" ssh, passed inside custom script helps.
I'm suspecting, that since it is a "google beta" capability, is it even working / reliable?
Maybe we just can't yet, create VMs i.e GCEs from "SourceImages" in GCP, Unless proven otherwise, with a simple goof-up not very evident in my TF.
I could solve it actually, and all this somewhere sounds very sick of GCE.
Problem was while creating the base image, the instance I had chosen has had the following :
#sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.6 2
#sudo update-alternatives --install /usr/bin/python3 python /usr/bin/python3.7 1
Maybe I should try with "python3" instead of "python",
but when instantiating GCEs basis this MachineImage, it looks for a rather deprecated "python2.7" and not "python3" and complained of missing / unreadable packages like netiplan etc.
Commenting the "update-alternatives" and installing python3.6 and python3.7 explicitly did the trick!

Google GKE startup script not workin for GKE Node

I am adding a startup script to my GKE nodes using terraform
provider "google" {
project = var.project
region = var.region
zone = var.zone
credentials = "google-key.json"
}
terraform {
backend "gcs" {
bucket = "tf-state-bucket-devenv"
prefix = "terraform"
credentials = "google-key.json"
}
}
resource "google_container_cluster" "primary" {
name = var.kube-clustername
location = var.zone
remove_default_node_pool = true
initial_node_count = 1
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "primary_preemptible_nodes" {
name = var.kube-poolname
location = var.zone
cluster = google_container_cluster.primary.name
node_count = var.kube-nodecount
node_config {
preemptible = var.kube-preemptible
machine_type = "n1-standard-1"
disk_size_gb = 10
disk_type = "pd-standard"
metadata = {
disable-legacy-endpoints = "true",
startup_script = "cd /mnt/stateful_partition/home && echo hi > test.txt"
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
}
}
When i go into the GCP interface, select the node and view the metadata I can see the key/value is there
metadata_startup_script
#!/bin/bash sudo su && cd /mnt/stateful_partition/home && echo hi > test.txt
However when running the below command on the node -
sudo google_metadata_script_runner --script-type startup -
-debug
I got the below -
startup-script: INFO Starting startup scripts.
startup-script: INFO No startup scripts found in metadata.
startup-script: INFO Finished running startup scripts.
Does anyone know why this script is not working/showing up? - Is it because its a GKE node and google dont let you edit these, I cant actually find anything on their documentation where they specifically say that.
You cannot specify startup scripts to run on GKE nodes. The node has a built-in startup sequence to initialize the node and join it to the cluster and to ensure that this works properly (e.g. to ensure that when you ask for 100 nodes you get 100 functional nodes) you cannot add additional logic to the startup sequence.
As an alternative, you can create a DaemonSet that runs on all of your nodes to perform node-level initialization. One advantage of this is that you can tweak your DaemonSet and re-apply it to existing nodes (without having to recreate them) if you want to change how they are configured.
Replace this metadata key name metadata_startup_script by this one startup_script.
In addition, your startup script runs as root user. you don't need to perform a sudo su

Why may be the reason fo helm_resource in terraform script being terribly slow?

I've a terraform script for my AWS EKS cluster and the following pieces there:
provider "helm" {
alias = "helm"
debug = true
kubernetes {
host = module.eks.endpoint
cluster_ca_certificate = module.eks.ca_certificate
token = data.aws_eks_cluster_auth.cluster.token
load_config_file = false
}
}
and:
resource "helm_release" "prometheus_operator" {
provider = "helm"
depends_on = [
module.eks.aws_eks_auth
]
chart = "stable/prometheus-operator"
name = "prometheus-operator"
values = [
file("staging/prometheus-operator-values.yaml")
]
wait = false
version = "8.12.12"
}
With this setup it takes ~15 minutes to install the required chart with terraform apply and sometimes it fails (with helm ls giving pending-install status). On the other hand if use the following command:
helm install prometheus-operator stable/prometheus-operator -f staging/prometheus-operator-values.yaml --version 8.12.12 --debug
the required chart gets installed in ~3 minutes and never fails. What is the reason for this behavior?
EDIT
Here is a log file from a failed installation. It's quit big - 5.6M. What bothers me a bit is located in line no 47725 and 56045
What's more, helm status prometheus-operator gives valid output (as if it was successfully installed), however there're no pods defined.
EDIT 2
I've also raised an issue.

nixops: how to use local ssh key when deploying on machine with existing nixos (targetEnv is none)?

I have machine with nixos (provisioned using terraform, config), I want to connect to it using deployment.targetHost = ipAddress and deployment.targetEnv = "none"
But I can't configure nixops to use /secrets/stage_ssh_key ssh key
This is not working ( actually this is not documented, I have found it here https://github.com/NixOS/nixops/blob/d4e5b779def1fc9e7cf124930d0148e6bd670051/nixops/backends/none.py#L33-L35 )
{
stage =
{ pkgs, ... }:
{
deployment.targetHost = (import ./nixos-generated/stage.nix).terraform.ip;
deployment.targetEnv = "none";
deployment.none.sshPrivateKey = builtins.readFile ./secrets/stage_ssh_key;
deployment.none.sshPublicKey = builtins.readFile ./secrets/stage_ssh_key.pub;
deployment.none.sshPublicKeyDeployed = true;
environment.systemPackages = with pkgs; [
file
];
};
}
nixops ssh stage results in asking for password, expected - login without password
nixops ssh stage -i ./secrets/stage_ssh_key works as expected, password is not asked
How to reproduce:
download repo
rm -rf secrets/*
add aws keys in secrets/aws.nix
{
EC2_ACCESS_KEY="XXXX";
EC2_SECRET_KEY="XXXX";
}
nix-shell
make generate_stage_ssh_key
terraform apply
make nixops_create
nixops deploy asks password